subrela.plot.get_dendrogram_data function

subrela.plot.get_dendrogram_data(Z, labels=None, groups=None, cut_bounds_min=0.0)[source]

Calculate data for drawing a dendrogram.

Parameters
  • Z (pandas.DataFrame) – Data of clusters returned by subrela.clustering.get_clusters function.

  • labels (list[str] or None, optional) – Labels of leaves in the order of leaf index. If None, a leaf index is used.

  • groups (list[int] or None, optional) – Cluster indices of groups. If None, a dendrogram is not separated into groups.

  • cut_bounds_min (float, optional) – Minimum distance between nodes around a cut line. A cut line is splitted to satisfy this condition if possible.

Returns

  • leaf_data (pandas.DataFrame) – Data of leaves.

  • node_data (pandas.DataFrame) – Data of nodes.

  • tree_data (pandas.DataFrame) – Data of tree lines.

  • cut_data (pandas.DataFrame) – Data of cut lines.

Raises

ValueError – If clusters in groups are not disjoint.

Notes

An index and columns of leaf_data are as follows:

leaf_data.indexint

Cluster index. The name is ‘leaf’.

leaf_data['label']str

Labels.

leaf_data['breadth']float

Positions. along the breadth direction.

An index and columns of node_data are as follows:

node_data.indexint

Cluster index. The name is ‘cluster’.

node_data['breadth']float

Positions along the breadth direction.

node_data['height']float

Positions along the height direction.

node_data['children'](2,) list[int]

Cluster indices of child nodes.

node_data['side']{‘first’, ‘last’}

Side in which a node located among sibling nodes. ‘first’ means that its value of the breadth is less than the other. ‘last’ means that its value of the breadth is greater than the other.

node_data['is_group']bool

True if a node is a group cluster.

An index and columns of tree_data are as follows:

tree_data.index

No meaning.

tree_data['cluster']int

Cluster index from which a line descends to a child.

tree_data['side']{‘first’, ‘last’}

Side of a child to which a line descends. ‘first’ means that a line descends to a child whose value of the breadth is less. ‘last’ means that a line descends to a child whose value of the breadth is greater.

tree_data['breadths'](3,) list[float]

Positions of start, corner, and end points along the breadth direction.

tree_data['heights'](3,) list[float]

Positions of start, corner, and end points along the height direction.

tree_data['group']int

Cluster index of a group to which a line belongs.

An index and columns of cut_data are as follows:

cut_data.indexint

Cluster index of a group. The name is ‘group’.

cut_data['breadths'](2,) list[float]

Positions of start and end points along the breadth direction.

cut_data['heights'](2,) list[float]

Positions of start and end points along the height direction.

Examples

>>> import numpy
>>> from subrela.clustering import get_clusters
>>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]])
>>> Z = get_clusters(X)
>>> leaf_data, node_data, tree_data, cut_data = get_dendrogram_data(Z)
>>> leaf_data
     label  breadth
leaf
0        0        2
1        1        3
2        2        4
3        3        0
4        4        1
>>> node_data
         breadth    height children   side  is_group
cluster
0          2.000  0.000000       []  first     False
1          3.000  0.000000       []  first     False
2          4.000  0.000000       []   last     False
3          0.000  0.000000       []  first     False
4          1.000  0.000000       []   last     False
5          3.500  2.000000   [1, 2]   last     False
6          0.500  4.000000   [3, 4]  first     False
7          2.750  5.099020   [0, 5]   last     False
8          1.625  6.324555   [6, 7]   last     False
>>> tree_data
   cluster   side             breadths                                            heights  group
0        5  first          [3.5, 3, 3]                                    [2.0, 2.0, 0.0]   <NA>
1        5   last          [3.5, 4, 4]                                    [2.0, 2.0, 0.0]   <NA>
2        6  first          [0.5, 0, 0]                                    [4.0, 4.0, 0.0]   <NA>
3        6   last          [0.5, 1, 1]                                    [4.0, 4.0, 0.0]   <NA>
4        7  first         [2.75, 2, 2]      [5.0990195135927845, 5.0990195135927845, 0.0]   <NA>
5        7   last     [2.75, 3.5, 3.5]      [5.0990195135927845, 5.0990195135927845, 2.0]   <NA>
6        8  first    [1.625, 0.5, 0.5]        [6.324555320336759, 6.324555320336759, 4.0]   <NA>
7        8   last  [1.625, 2.75, 2.75]  [6.324555320336759, 6.324555320336759, 5.09901...   <NA>
>>> cut_data
Empty DataFrame
Columns: [breadths, heights]
Index: []
>>> leaf_data, _, _, _ = get_dendrogram_data(
...     Z, labels=['A', 'B', 'C', 'D', 'E'])
>>> leaf_data
     label  breadth
leaf
0        A        2
1        B        3
2        C        4
3        D        0
4        E        1
>>> _, node_data, tree_data, cut_data = get_dendrogram_data(
...     Z, groups=[0, 5, 6])
>>> node_data
         breadth    height children   side  is_group
cluster
0          2.000  0.000000       []  first      True
1          3.000  0.000000       []  first     False
2          4.000  0.000000       []   last     False
3          0.000  0.000000       []  first     False
4          1.000  0.000000       []   last     False
5          3.500  2.000000   [1, 2]   last      True
6          0.500  4.000000   [3, 4]  first      True
7          2.750  5.099020   [0, 5]   last     False
8          1.625  6.324555   [6, 7]   last     False
>>> tree_data
   cluster   side             breadths                                            heights  group
0        5  first          [3.5, 3, 3]                                    [2.0, 2.0, 0.0]      5
1        5   last          [3.5, 4, 4]                                    [2.0, 2.0, 0.0]      5
2        6  first          [0.5, 0, 0]                                    [4.0, 4.0, 0.0]      6
3        6   last          [0.5, 1, 1]                                    [4.0, 4.0, 0.0]      6
4        7  first         [2.75, 2, 2]      [5.0990195135927845, 5.0990195135927845, 0.0]   <NA>
5        7   last     [2.75, 3.5, 3.5]      [5.0990195135927845, 5.0990195135927845, 2.0]   <NA>
6        8  first    [1.625, 0.5, 0.5]        [6.324555320336759, 6.324555320336759, 4.0]   <NA>
7        8   last  [1.625, 2.75, 2.75]  [6.324555320336759, 6.324555320336759, 5.09901...   <NA>
>>> cut_data
          breadths                                 heights
group
0       [1.5, 2.5]  [4.549509756796392, 4.549509756796392]
5       [2.5, 4.5]  [4.549509756796392, 4.549509756796392]
6      [-0.5, 1.5]  [4.549509756796392, 4.549509756796392]
>>> _, _, _, cut_data = get_dendrogram_data(
...     Z, groups=[0, 5, 6], cut_bounds_min=1.5)
>>> cut_data
          breadths                                   heights
group
0       [1.5, 2.5]  [3.5495097567963922, 3.5495097567963922]
5       [2.5, 4.5]  [3.5495097567963922, 3.5495097567963922]
6      [-0.5, 1.5]      [5.16227766016838, 5.16227766016838]