subrela.analysis.get_weak_relevance_scores function¶
-
subrela.analysis.get_weak_relevance_scores(subset_scores, Z, group, subgroups=None)[source]¶ Calculate weak relevance scores of subgroups.
- Parameters
subset_scores (pandas.Series) – Scores for feature subsets.
Z (pandas.DataFrame) – Data of clusters returned by
subrela.clustering.get_clustersfunction.group (int) – Cluster index of a group.
subgroups (list[int] or None, optional) – Cluster indices of subgroups whose weak relevance scores are calculated. If
None, weak relevance scores are calculated for all subgroups.
- Returns
wrs (pandas.DataFrame) – Weak relevance scores.
- Raises
ValueError – If a cluster in
subgroupsis not a subgroup of clustergroup.
Notes
An index and columns of
wrsare as follows:wrs.indexintCluster index.
wrs['subset_score']floatBest score among feature subsets icluding at least one feature in a subgroup but not including features in a group out of subgroup.
wrs['subset_score_ref']floatBest score among feature subsets not including features in a group.
wrs['relevance_score']floatWeak relevance score, which is
wrs['subset_score'] - wrs['subset_score_ref'].
Examples
>>> import numpy >>> from subrela.records import from_arrays >>> from subrela.clustering import get_clusters >>> subset_scores = from_arrays([[False, False, False, True, True], ... [True, False, False, True, True], ... [False, True, False, True, True], ... [True, True, False, True, True], ... [False, False, True, True, True], ... [True, False, True, True, True], ... [False, True, True, True, True], ... [True, True, True, True, True]], ... [0.7, 0.7, 0.8, 0.8, 0.9, 0.9, 1., 1.]) >>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]]) >>> Z = get_clusters(X)
>>> get_weak_relevance_scores(subset_scores, Z, 5) subset_score subset_score_ref relevance_score subgroup 5 1.0 0.7 0.3 1 0.8 0.7 0.1 2 0.9 0.7 0.2
>>> get_weak_relevance_scores(subset_scores, Z, 5, subgroups=[1, 2]) subset_score subset_score_ref relevance_score subgroup 1 0.8 0.7 0.1 2 0.9 0.7 0.2