shapley_score#

mvpy.model_selection.shapley_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, return_shapley: bool = True, n_permutations: int = 10, n_jobs: int | None = None, n_jobs_validator: int | None = None, verbose: int | bool = False, verbose_validator: int | bool = False) ndarray | Tensor | Dict | Tuple[Shapley, ndarray, Tensor, Dict][source]#

Implements a shorthand for Shapley scoring over all feature permutations in \(X\) describing \(y\).

This function acts as a shorthand for Shapley where it will automatically create and fit all groups of predictors specified in \(X\) following a shapley procedure. Returns either only the output scores or, if return_shapley is True, both the fitted shapley object and the scores in a tuple.

For more information, please consult Shapley.

Warning

This performs \(n k p\) model fits where \(n\) is the number of permutations, \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\) or \(n\), this becomes expensive to solve.

Parameters:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit and score. Can be either a pipeline or estimator object.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

groupsOptional[List | np.ndarray | torch.Tensor], default=None

Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix (n_predictors, n_predictors).

dimOptional[int], default=None

The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

cvint | Any, default=5

The cross-validation procedure to follow. Either an object exposing a split() method, such as :py:class`~mvpy.crossvalidation.KFold`, or an integer specifying the number of folds to use in KFold.

metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

return_shapleybool, default=True

Should the underlying Shapley object be returned?

n_permutationsint, default=10

How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

n_jobs_validatorOptional[int], default=None

How many jobs should be used to parallelise the cross-validation procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

verbose_validatorint | bool, default=False

Should progress in individual Validator objects be reported verbosely?

Returns:
shapleyOptional[mvpy.model_selection.Shapley]

If return_shapley is True, the underlying Shapley object.

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The all hierarchical scores of shape (n_permutations, n_sets, n_cv[, ...]) or a dictionary containing each individual Metric.

See also

mvpy.model_selection.hierarchical_score, mvpy.model_selection.Hierarchical

An alternative scoring method computing the full permutation of feature combinations.

mvpy.model_selection.Shapley

The underlying shapley scoring object.

mvpy.crossvalidation.Validator

The cross-validation objects used in Shapley.

Notes

All entries of scores are relative to the baseline group, except for, of course, the baseline group itself.

Warning

If multiple values are supplied for metric, this class will produce a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model or any n_jobs_validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Examples

>>> import torch
>>> from mvpy import metrics
>>> from mvpy.dataset import make_meeg_continuous
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import TimeDelayed
>>> from mvpy.model_selection import shapley_score
>>> from sklearn.pipeline import make_pipeline
>>> # create dataset
>>> fs = 200
>>> X, y = make_meeg_continuous(fs = fs, n_features = 5)
>>> # setup pipeline for estimation of multivariate temporal response functions
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     TimeDelayed(
>>>         -1.0, 0.0, fs, 
>>>         alphas = torch.logspace(-5, 5, 10, device = device)
>>>     )
>>> )
>>> # setup groups of predictors
>>> groups = torch.tensor(
>>>     [
>>>         [1, 1, 1, 0, 0],
>>>         [1, 1, 1, 1, 0],
>>>         [1, 1, 1, 0, 1]
>>>     ], 
>>>     dtype = torch.long,
>>>     device = device
>>> )
>>> # score predictors hierarchically
>>> shapley, score = shapley_score(
>>>     trf, X, y, 
>>>     groups = groups,
>>>     metric = (metrics.r2, metrics.pearsonr)
>>>     verbose = True
>>> )
>>> score['r2'].shape
torch.size([10, 4, 5, 64, 400])