subrela.analysis.get_strong_relevance_scores function¶
-
subrela.analysis.get_strong_relevance_scores(subset_scores, Z, clusters=None, descendants=False)[source]¶ Calculate strong relevance scores of clusters.
- Parameters
subset_scores (pandas.Series) – Scores for feature subsets.
Z (pandas.DataFrame) – Data of clusters returned by
subrela.clustering.get_clustersfunction.clusters (list[int] or None, optional) – Cluster indices whose strong relevance scores are calculated. If
None, strong relevance scores are calculated for all clusters.descendants (bool, optional) – If
True, strong relevance scores are calculated also for descendant clusters.
- Returns
srs (pandas.DataFrame) – Strong relevance scores.
Notes
An index and columns of
srsare as follows:srs.indexintCluster index.
srs['subset_score_ref']floatBest score among all feature subsets.
srs['subset_score']floatBest score among feature subsets not including features in a cluster.
srs['relevance_score']floatStrong relevance score, which is
srs['subset_score_ref'] - srs['subset_score'].
Examples
>>> import numpy >>> from subrela.records import from_arrays >>> from subrela.clustering import get_clusters >>> subset_scores = from_arrays([[False, False, False, True, True], ... [True, False, False, True, True], ... [False, True, False, True, True], ... [True, True, False, True, True], ... [False, False, True, True, True], ... [True, False, True, True, True], ... [False, True, True, True, True], ... [True, True, True, True, True]], ... [0.7, 0.7, 0.8, 0.8, 0.9, 0.9, 1., 1.]) >>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]]) >>> Z = get_clusters(X)
>>> get_strong_relevance_scores(subset_scores, Z) subset_score_ref subset_score relevance_score cluster 0 1.0 1.0 0.0 1 1.0 0.9 0.1 2 1.0 0.8 0.2 3 1.0 NaN NaN 4 1.0 NaN NaN 5 1.0 0.7 0.3 6 1.0 NaN NaN 7 1.0 0.7 0.3 8 1.0 NaN NaN
>>> get_strong_relevance_scores(subset_scores, Z, clusters=[5, 6]) subset_score_ref subset_score relevance_score cluster 5 1.0 0.7 0.3 6 1.0 NaN NaN
>>> get_strong_relevance_scores(subset_scores, Z, clusters=[5, 6], ... descendants=True) subset_score_ref subset_score relevance_score cluster 1 1.0 0.9 0.1 2 1.0 0.8 0.2 3 1.0 NaN NaN 4 1.0 NaN NaN 5 1.0 0.7 0.3 6 1.0 NaN NaN