subrela.analysis.get_weak_relevance_scores function

subrela.analysis.get_weak_relevance_scores(subset_scores, Z, group, subgroups=None)[source]

Calculate weak relevance scores of subgroups.

Parameters
  • subset_scores (pandas.Series) – Scores for feature subsets.

  • Z (pandas.DataFrame) – Data of clusters returned by subrela.clustering.get_clusters function.

  • group (int) – Cluster index of a group.

  • subgroups (list[int] or None, optional) – Cluster indices of subgroups whose weak relevance scores are calculated. If None, weak relevance scores are calculated for all subgroups.

Returns

wrs (pandas.DataFrame) – Weak relevance scores.

Raises

ValueError – If a cluster in subgroups is not a subgroup of cluster group.

Notes

An index and columns of wrs are as follows:

wrs.indexint

Cluster index.

wrs['subset_score']float

Best score among feature subsets icluding at least one feature in a subgroup but not including features in a group out of subgroup.

wrs['subset_score_ref']float

Best score among feature subsets not including features in a group.

wrs['relevance_score']float

Weak relevance score, which is wrs['subset_score'] - wrs['subset_score_ref'].

Examples

>>> import numpy
>>> from subrela.records import from_arrays
>>> from subrela.clustering import get_clusters
>>> subset_scores = from_arrays([[False, False, False, True, True],
...                              [True, False, False, True, True],
...                              [False, True, False, True, True],
...                              [True, True, False, True, True],
...                              [False, False, True, True, True],
...                              [True, False, True, True, True],
...                              [False, True, True, True, True],
...                              [True, True, True, True, True]],
...                             [0.7, 0.7, 0.8, 0.8, 0.9, 0.9, 1., 1.])
>>> X = numpy.array([[0, -5, -5, 6, 6], [0, -1, 1, -2, 2]])
>>> Z = get_clusters(X)
>>> get_weak_relevance_scores(subset_scores, Z, 5)
          subset_score  subset_score_ref  relevance_score
subgroup
5                  1.0               0.7              0.3
1                  0.8               0.7              0.1
2                  0.9               0.7              0.2
>>> get_weak_relevance_scores(subset_scores, Z, 5, subgroups=[1, 2])
          subset_score  subset_score_ref  relevance_score
subgroup
1                  0.8               0.7              0.1
2                  0.9               0.7              0.2