subrela.clustering.get_clusters function¶
-
subrela.clustering.get_clusters(X, metric='euclidean', method='single', optimal_ordering=False)[source]¶ Perform an agglomerative hierarchical clustering of features.
- Parameters
X ((M, N) numpy.ndarray) – Values of N features for M samples.
metric (str or callable, optional) – Metric for measuring a distance between two features. Passed to
metricparameter ofscipy.cluster.hierarchy.linkagefunction.method (str, optional) – Linkage method for calculating a distance between two clusters. Passed to
methodparameter ofscipy.cluster.hierarchy.linkagefunction.optimal_ordering (bool, optional) – Leaves are reordered if
True. Passed tooptimal_orderingparameter ofscipy.cluster.hierarchy.linkagefunction.
- Returns
Z (pandas.DataFrame) – Data of clusters and their linkages.
Notes
An index and columns of
clustersare as follows:clusters.indexintIndex of a cluster.
clusters['children'](2,) list[int]Indices of child clusters.
clusters['distance']floatDistance between child clusters.
clusters['leaves']list[int]Indices of features which are descendants of a cluster.
Clusters 0 to
N - 1correspond to the first toNth features.Examples
>>> import numpy >>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]]) >>> get_clusters(X) children distance leaves cluster 5 [1, 2] 2.000000 [1, 2] 6 [3, 4] 4.000000 [3, 4] 7 [0, 5] 5.099020 [0, 1, 2] 8 [6, 7] 6.324555 [3, 4, 0, 1, 2]