subrela.clustering.get_clusters function

subrela.clustering.get_clusters(X, metric='euclidean', method='single', optimal_ordering=False)[source]

Perform an agglomerative hierarchical clustering of features.

Parameters
Returns

Z (pandas.DataFrame) – Data of clusters and their linkages.

Notes

An index and columns of clusters are as follows:

clusters.indexint

Index of a cluster.

clusters['children'](2,) list[int]

Indices of child clusters.

clusters['distance']float

Distance between child clusters.

clusters['leaves']list[int]

Indices of features which are descendants of a cluster.

Clusters 0 to N - 1 correspond to the first to Nth features.

Examples

>>> import numpy
>>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]])
>>> get_clusters(X)
        children  distance           leaves
cluster
5         [1, 2]  2.000000           [1, 2]
6         [3, 4]  4.000000           [3, 4]
7         [0, 5]  5.099020        [0, 1, 2]
8         [6, 7]  6.324555  [3, 4, 0, 1, 2]