shapley_score#
- mvpy.model_selection.shapley_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, return_shapley: bool = True, n_permutations: int = 10, n_jobs: int | None = None, n_jobs_validator: int | None = None, verbose: int | bool = False, verbose_validator: int | bool = False) ndarray | Tensor | Dict | Tuple[Shapley, ndarray, Tensor, Dict][source]#
Implements a shorthand for Shapley scoring over all feature permutations in \(X\) describing \(y\).
This function acts as a shorthand for
Shapleywhere it will automatically create and fit all groups of predictors specified in \(X\) following a shapley procedure. Returns either only the output scores or, ifreturn_shapleyisTrue, both the fitted shapley object and the scores in a tuple.For more information, please consult
Shapley.Warning
This performs \(n k p\) model fits where \(n\) is the number of permutations, \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\) or \(n\), this becomes expensive to solve.
- Parameters:
- modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator
The model to fit and score. Can be either a pipeline or estimator object.
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- groupsOptional[List | np.ndarray | torch.Tensor], default=None
Matrix describing all groups of interest of shape
(n_groups, n_predictors). IfNone, this will default to the identity matrix(n_predictors, n_predictors).- dimOptional[int], default=None
The dimension in \(X\) that describes the predictors. If
None, this will assume-1for 2D data and-2otherwise.- cvint | Any, default=5
The cross-validation procedure to follow. Either an object exposing a
split()method, such as :py:class`~mvpy.crossvalidation.KFold`, or an integer specifying the number of folds to use inKFold.- metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None
The metric to use for scoring. If
None, this will default to thescore()method exposed bymodel.- return_shapleybool, default=True
Should the underlying
Shapleyobject be returned?- n_permutationsint, default=10
How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- n_jobs_validatorOptional[int], default=None
How many jobs should be used to parallelise the cross-validation procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- verbose_validatorint | bool, default=False
Should progress in individual
Validatorobjects be reported verbosely?
- Returns:
- shapleyOptional[mvpy.model_selection.Shapley]
If
return_shapleyisTrue, the underlyingShapleyobject.- scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The all hierarchical scores of shape
(n_permutations, n_sets, n_cv[, ...])or a dictionary containing each individualMetric.
See also
mvpy.model_selection.hierarchical_score,mvpy.model_selection.HierarchicalAn alternative scoring method computing the full permutation of feature combinations.
mvpy.model_selection.ShapleyThe underlying shapley scoring object.
mvpy.crossvalidation.ValidatorThe cross-validation objects used in
Shapley.
Notes
All entries of scores are relative to the baseline group, except for, of course, the baseline group itself.
Warning
If multiple values are supplied for
metric, this class will produce a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.Warning
When specifying
n_jobshere, be careful not to specify any number of jobs in the model or anyn_jobs_validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.Examples
>>> import torch >>> from mvpy import metrics >>> from mvpy.dataset import make_meeg_continuous >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import TimeDelayed >>> from mvpy.model_selection import shapley_score >>> from sklearn.pipeline import make_pipeline >>> # create dataset >>> fs = 200 >>> X, y = make_meeg_continuous(fs = fs, n_features = 5) >>> # setup pipeline for estimation of multivariate temporal response functions >>> trf = make_pipeline( >>> Scaler().to_torch(), >>> TimeDelayed( >>> -1.0, 0.0, fs, >>> alphas = torch.logspace(-5, 5, 10, device = device) >>> ) >>> ) >>> # setup groups of predictors >>> groups = torch.tensor( >>> [ >>> [1, 1, 1, 0, 0], >>> [1, 1, 1, 1, 0], >>> [1, 1, 1, 0, 1] >>> ], >>> dtype = torch.long, >>> device = device >>> ) >>> # score predictors hierarchically >>> shapley, score = shapley_score( >>> trf, X, y, >>> groups = groups, >>> metric = (metrics.r2, metrics.pearsonr) >>> verbose = True >>> ) >>> score['r2'].shape torch.size([10, 4, 5, 64, 400])