subrela.clustering.get_clusters function¶

subrela.clustering.get_clusters(X, metric='euclidean', method='single', optimal_ordering=False)[source]¶

Perform an agglomerative hierarchical clustering of features.

Parameters

X ((M, N) numpy.ndarray) – Values of N features for M samples.
metric (str or callable, optional) – Metric for measuring a distance between two features. Passed to metric parameter of scipy.cluster.hierarchy.linkage function.
method (str, optional) – Linkage method for calculating a distance between two clusters. Passed to method parameter of scipy.cluster.hierarchy.linkage function.
optimal_ordering (bool, optional) – Leaves are reordered if True. Passed to optimal_ordering parameter of scipy.cluster.hierarchy.linkage function.

Returns

Z (pandas.DataFrame) – Data of clusters and their linkages.

Notes

An index and columns of clusters are as follows:

clusters.indexint
Index of a cluster.

clusters['children'](2,) list[int]
Indices of child clusters.

clusters['distance']float
Distance between child clusters.

clusters['leaves']list[int]
Indices of features which are descendants of a cluster.

Clusters 0 to N - 1 correspond to the first to Nth features.

Examples

>>> import numpy
>>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]])
>>> get_clusters(X)
        children  distance           leaves
cluster
5         [1, 2]  2.000000           [1, 2]
6         [3, 4]  4.000000           [3, 4]
7         [0, 5]  5.099020        [0, 1, 2]
8         [6, 7]  6.324555  [3, 4, 0, 1, 2]

subrela.clustering.get_clusters function¶

Table of Contents

Previous topic

Next topic