subrela.analysis.get_strong_relevance_scores function

subrela.analysis.get_strong_relevance_scores(subset_scores, Z, clusters=None, descendants=False)[source]

Calculate strong relevance scores of clusters.

Parameters
  • subset_scores (pandas.Series) – Scores for feature subsets.

  • Z (pandas.DataFrame) – Data of clusters returned by subrela.clustering.get_clusters function.

  • clusters (list[int] or None, optional) – Cluster indices whose strong relevance scores are calculated. If None, strong relevance scores are calculated for all clusters.

  • descendants (bool, optional) – If True, strong relevance scores are calculated also for descendant clusters.

Returns

srs (pandas.DataFrame) – Strong relevance scores.

Notes

An index and columns of srs are as follows:

srs.indexint

Cluster index.

srs['subset_score_ref']float

Best score among all feature subsets.

srs['subset_score']float

Best score among feature subsets not including features in a cluster.

srs['relevance_score']float

Strong relevance score, which is srs['subset_score_ref'] - srs['subset_score'].

Examples

>>> import numpy
>>> from subrela.records import from_arrays
>>> from subrela.clustering import get_clusters
>>> subset_scores = from_arrays([[False, False, False, True, True],
...                              [True, False, False, True, True],
...                              [False, True, False, True, True],
...                              [True, True, False, True, True],
...                              [False, False, True, True, True],
...                              [True, False, True, True, True],
...                              [False, True, True, True, True],
...                              [True, True, True, True, True]],
...                             [0.7, 0.7, 0.8, 0.8, 0.9, 0.9, 1., 1.])
>>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]])
>>> Z = get_clusters(X)
>>> get_strong_relevance_scores(subset_scores, Z)
         subset_score_ref  subset_score  relevance_score
cluster
0                     1.0           1.0              0.0
1                     1.0           0.9              0.1
2                     1.0           0.8              0.2
3                     1.0           NaN              NaN
4                     1.0           NaN              NaN
5                     1.0           0.7              0.3
6                     1.0           NaN              NaN
7                     1.0           0.7              0.3
8                     1.0           NaN              NaN
>>> get_strong_relevance_scores(subset_scores, Z, clusters=[5, 6])
         subset_score_ref  subset_score  relevance_score
cluster
5                     1.0           0.7              0.3
6                     1.0           NaN              NaN
>>> get_strong_relevance_scores(subset_scores, Z, clusters=[5, 6],
...                             descendants=True)
         subset_score_ref  subset_score  relevance_score
cluster
1                     1.0           0.9              0.1
2                     1.0           0.8              0.2
3                     1.0           NaN              NaN
4                     1.0           NaN              NaN
5                     1.0           0.7              0.3
6                     1.0           NaN              NaN