subrela.plot.get_dendrogram_data function¶
-
subrela.plot.get_dendrogram_data(Z, labels=None, groups=None, cut_bounds_min=0.0)[source]¶ Calculate data for drawing a dendrogram.
- Parameters
Z (pandas.DataFrame) – Data of clusters returned by
subrela.clustering.get_clustersfunction.labels (list[str] or None, optional) – Labels of leaves in the order of leaf index. If
None, a leaf index is used.groups (list[int] or None, optional) – Cluster indices of groups. If
None, a dendrogram is not separated into groups.cut_bounds_min (float, optional) – Minimum distance between nodes around a cut line. A cut line is splitted to satisfy this condition if possible.
- Returns
leaf_data (pandas.DataFrame) – Data of leaves.
node_data (pandas.DataFrame) – Data of nodes.
tree_data (pandas.DataFrame) – Data of tree lines.
cut_data (pandas.DataFrame) – Data of cut lines.
- Raises
ValueError – If clusters in
groupsare not disjoint.
Notes
An index and columns of
leaf_dataare as follows:leaf_data.indexintCluster index. The name is ‘leaf’.
leaf_data['label']strLabels.
leaf_data['breadth']floatPositions. along the breadth direction.
An index and columns of
node_dataare as follows:node_data.indexintCluster index. The name is ‘cluster’.
node_data['breadth']floatPositions along the breadth direction.
node_data['height']floatPositions along the height direction.
node_data['children'](2,) list[int]Cluster indices of child nodes.
node_data['side']{‘first’, ‘last’}Side in which a node located among sibling nodes. ‘first’ means that its value of the breadth is less than the other. ‘last’ means that its value of the breadth is greater than the other.
node_data['is_group']boolTrueif a node is a group cluster.
An index and columns of
tree_dataare as follows:tree_data.indexNo meaning.
tree_data['cluster']intCluster index from which a line descends to a child.
tree_data['side']{‘first’, ‘last’}Side of a child to which a line descends. ‘first’ means that a line descends to a child whose value of the breadth is less. ‘last’ means that a line descends to a child whose value of the breadth is greater.
tree_data['breadths'](3,) list[float]Positions of start, corner, and end points along the breadth direction.
tree_data['heights'](3,) list[float]Positions of start, corner, and end points along the height direction.
tree_data['group']intCluster index of a group to which a line belongs.
An index and columns of
cut_dataare as follows:cut_data.indexintCluster index of a group. The name is ‘group’.
cut_data['breadths'](2,) list[float]Positions of start and end points along the breadth direction.
cut_data['heights'](2,) list[float]Positions of start and end points along the height direction.
Examples
>>> import numpy >>> from subrela.clustering import get_clusters >>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]]) >>> Z = get_clusters(X)
>>> leaf_data, node_data, tree_data, cut_data = get_dendrogram_data(Z) >>> leaf_data label breadth leaf 0 0 2 1 1 3 2 2 4 3 3 0 4 4 1 >>> node_data breadth height children side is_group cluster 0 2.000 0.000000 [] first False 1 3.000 0.000000 [] first False 2 4.000 0.000000 [] last False 3 0.000 0.000000 [] first False 4 1.000 0.000000 [] last False 5 3.500 2.000000 [1, 2] last False 6 0.500 4.000000 [3, 4] first False 7 2.750 5.099020 [0, 5] last False 8 1.625 6.324555 [6, 7] last False >>> tree_data cluster side breadths heights group 0 5 first [3.5, 3, 3] [2.0, 2.0, 0.0] <NA> 1 5 last [3.5, 4, 4] [2.0, 2.0, 0.0] <NA> 2 6 first [0.5, 0, 0] [4.0, 4.0, 0.0] <NA> 3 6 last [0.5, 1, 1] [4.0, 4.0, 0.0] <NA> 4 7 first [2.75, 2, 2] [5.0990195135927845, 5.0990195135927845, 0.0] <NA> 5 7 last [2.75, 3.5, 3.5] [5.0990195135927845, 5.0990195135927845, 2.0] <NA> 6 8 first [1.625, 0.5, 0.5] [6.324555320336759, 6.324555320336759, 4.0] <NA> 7 8 last [1.625, 2.75, 2.75] [6.324555320336759, 6.324555320336759, 5.09901... <NA> >>> cut_data Empty DataFrame Columns: [breadths, heights] Index: []
>>> leaf_data, _, _, _ = get_dendrogram_data( ... Z, labels=['A', 'B', 'C', 'D', 'E']) >>> leaf_data label breadth leaf 0 A 2 1 B 3 2 C 4 3 D 0 4 E 1
>>> _, node_data, tree_data, cut_data = get_dendrogram_data( ... Z, groups=[0, 5, 6]) >>> node_data breadth height children side is_group cluster 0 2.000 0.000000 [] first True 1 3.000 0.000000 [] first False 2 4.000 0.000000 [] last False 3 0.000 0.000000 [] first False 4 1.000 0.000000 [] last False 5 3.500 2.000000 [1, 2] last True 6 0.500 4.000000 [3, 4] first True 7 2.750 5.099020 [0, 5] last False 8 1.625 6.324555 [6, 7] last False >>> tree_data cluster side breadths heights group 0 5 first [3.5, 3, 3] [2.0, 2.0, 0.0] 5 1 5 last [3.5, 4, 4] [2.0, 2.0, 0.0] 5 2 6 first [0.5, 0, 0] [4.0, 4.0, 0.0] 6 3 6 last [0.5, 1, 1] [4.0, 4.0, 0.0] 6 4 7 first [2.75, 2, 2] [5.0990195135927845, 5.0990195135927845, 0.0] <NA> 5 7 last [2.75, 3.5, 3.5] [5.0990195135927845, 5.0990195135927845, 2.0] <NA> 6 8 first [1.625, 0.5, 0.5] [6.324555320336759, 6.324555320336759, 4.0] <NA> 7 8 last [1.625, 2.75, 2.75] [6.324555320336759, 6.324555320336759, 5.09901... <NA> >>> cut_data breadths heights group 0 [1.5, 2.5] [4.549509756796392, 4.549509756796392] 5 [2.5, 4.5] [4.549509756796392, 4.549509756796392] 6 [-0.5, 1.5] [4.549509756796392, 4.549509756796392]
>>> _, _, _, cut_data = get_dendrogram_data( ... Z, groups=[0, 5, 6], cut_bounds_min=1.5) >>> cut_data breadths heights group 0 [1.5, 2.5] [3.5495097567963922, 3.5495097567963922] 5 [2.5, 4.5] [3.5495097567963922, 3.5495097567963922] 6 [-0.5, 1.5] [5.16227766016838, 5.16227766016838]