mvpy.model_selection package#
Submodules#
mvpy.model_selection.hierarchical module#
- class mvpy.model_selection.hierarchical.Hierarchical(validator: Validator, n_jobs: int | None = None, verbose: int | bool = False, on_underspecified: str = 'raise')[source]#
Bases:
objectImplements a hierarchical scoring procedure over all feature permutations in \(X\) describing \(y\).
When modeling outcomes \(y\), a common question to ask is what specific combination of predictors in \(X\) explains the observed data best. One way to tackle this question is to iteratively cross-validate the scoring of predictions \(\hat{y}\) from each possible feature combination in \(X\). For example, if we have three features in \(X\), we would model y as a function of feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) such that we can now compare how well each individual predictor and combination of predictors explain the data.
This class implements precisely this hierarchical modeling procedure, but allows creation of groups of predictors. For example, we might have several predictors in \(X\) that, together, form some kind of baseline. We might then specify:
\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]to signal that there are three total groups of predictors that we want to permute together, the first one including predictors 1-3, and the following two including one novel predictor each.
Hierarchicalwould now compute \(2^3 - 1\) combinations instead of the full \(2^6 - 1\) combinations. As described before, this yields the feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) where feature \((0,)\) groups predictors \(\{0, 1, 2\}\).Observe, however, that this now means that the permutations include those permutations where the baseline predictors are not included in all other models–for example, \((1,)\) which would evaluate to \([0, 0, 0, 1, 0]\). If we want to enforce that all models include the baseline, we should make them part of every other group:
\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \end{bmatrix}\end{split}\]The backend will automatically remove duplicates, leaving us with only the contrasts that are of interest to us \(\left[(0,), (0, 1), (0, 2), (0, 1, 2)\right]\) or, expressed as boolean masks:
\[\begin{split}M = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix}\end{split}\]In code, you can confirm the desired grouping in
group_, the resulting feature combinations inset_and the masks that were applied inmask_.Warning
This performs \(k\left(2^p - 1\right)\) individual model fits where \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\), this becomes exponentially more expensive to solve. If you are interested in the unique contribution of each feature rather than separate estimates for all combinations, consider using
Shapleyinstead.Warning
The default behaviour of this class is to check whether all predictors in \(X\) appear in the group specification
groupsat least once. If this is not the case, the class willraisean exception. If you would like to mutate this behaviour to either ignore or warn about these cases only, you may want to supply the correspondingon_underspecifiedvalue.Warning
When specifying
n_jobshere, be careful not to specify any number of jobs in the model or underlying validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.- Parameters:
- validatormvpy.crossvalidation.Validator
The validator object that should be used in this procedure.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
- Attributes:
- validatormvpy.crossvalidation.Validator
The validator object that should be used in this procedure.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
- validator_List[mvpy.crossvalidation.Validator]
A list of all fitted validators.
- score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The hierarchical scores of shape
(n_sets, n_cv[, ...])or a dictionary containing each individualMetric.- mask_np.ndarray | torch.Tensor
A matrix where each row corresponds to one boolean mask used to fit one validator.
- group_np.ndarray | torch.Tensor
A matrix where each row corresponds to the boolean mask for one group.
- group_id_: np.ndarray | torch.Tensor
A vector containing group identifiers used in sets.
- set_List[Tuple[int]]
A list including all group combinations that were tested.
See also
mvpy.model_selection.shapley_score,mvpy.model_selection.ShapleyAn alternative scoring method computing unique contributions of each feature rather than the full permutation.
mvpy.model_selection.hierarchical_scoreA shorthand for fitting this class.
mvpy.crossvalidation.ValidatorThe cross-validation object required by
Hierarchical.
Notes
Currently this does not automatically select the best model for you. Instead, it will return all scores, leaving further decisions up to you. This is because, for most applications, the scores of all permutations are actually of interest and may need to be reported.
Warning
If multiple values are supplied for
metric, this class will produce a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.Examples
>>> import torch >>> from mvpy import metrics >>> from mvpy.dataset import make_meeg_continuous >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import TimeDelayed >>> from mvpy.crossvalidation import Validator >>> from mvpy.model_selection import Hierarchical >>> from sklearn.pipeline import make_pipeline >>> # create dataset >>> fs = 200 >>> X, y = make_meeg_continuous(fs = fs, n_features = 5) >>> # setup pipeline for estimation of multivariate temporal response functions >>> trf = make_pipeline( >>> Scaler().to_torch(), >>> TimeDelayed( >>> -1.0, 0.0, fs, >>> alphas = torch.logspace(-5, 5, 10, device = device) >>> ) >>> ) >>> # setup validator >>> validator = Validator( >>> trf, >>> metric = (metrics.r2, metrics.pearsonr), >>> ) >>> # setup groups of predictors >>> groups = torch.tensor( >>> [ >>> [1, 1, 1, 0, 0], >>> [1, 1, 1, 1, 0], >>> [1, 1, 1, 0, 1] >>> ], >>> dtype = torch.long, >>> device = device >>> ) >>> # score predictors hierarchically >>> hierarchical = Hierarchical(validator, verbose = True).fit( >>> X, y, >>> groups = groups >>> ) >>> hierarchical.score_['r2'].shape torch.size([4, 5, 64, 400])
- fit(X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None) Hierarchical[source]#
Fit all models in a hierarchical manner.
- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitray shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- groupsOptional[List | np.ndarray | torch.Tensor], default=None
Matrix describing all groups of interest of shape
(n_groups, n_predictors). IfNone, this will default to the identity matrix of(n_predictors, n_predictors).- dimOptional[int], default=None
The dimension in \(X\) that describes the predictors. If
None, this will assume-1for 2D data and-2otherwise.
- Returns:
- hierarchicalmvpy.model_selection.Hierarchical
The fitted hierarchical model selector.
- mvpy.model_selection.hierarchical.check_dims_and_groups_(X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, on_underspecified: str = 'raise') Tuple[int, ndarray | Tensor, List, ndarray | Tensor][source]#
Check dimension and groups and create corresponding sets and masks.
- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitray shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- groupsOptional[List | np.ndarray | torch.Tensor], default=None
Matrix describing all groups of interest of shape
(n_groups, n_predictors). IfNone, this will default to the identity matrix of(n_predictors, n_predictors).- dimOptional[int], default=None
The dimension in \(X\) that describes the predictors. If
None, this will assume-1for 2D data and-2otherwise.- on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
- Returns:
- dimint
The dimension along which groups exist.
- group_idsnp.ndarray | torch.Tensor
Vector containing group identifiers.
- setsList[Tuple[int]]
List containing tuples of group identifiers, together forming all combinations.
- masksnp.ndarray | torch.Tensor
The masks corresponding to each feature combination in sets.
- mvpy.model_selection.hierarchical.fit_validator_(validator: Validator, X: ndarray | Tensor, y: ndarray | Tensor, mask: ndarray | Tensor, dim: int, i: int) Dict[source]#
Fit an individual validator object.
- Parameters:
- validatormvpy.crossvalidation.Validator
The validator object to be fit. Note that this is automatically cloned here.
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- masknp.ndarray | torch.Tensor
The mask to apply to the input data.
- dimint
The dimension along which to apply the mask.
- iint
The identifier to return.
- Returns:
- resultsDict
- A dictionary containing the results:
- iint
The identifier for this validator.
- validatormvpy.crossvalidation.Validator
The fitted validator.
- scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The score of the validator.
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional
Python part of the warnings subsystem.
mvpy.model_selection.hierarchical_score module#
- mvpy.model_selection.hierarchical_score.hierarchical_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, return_hierarchical: bool = True, n_jobs: int | None = None, n_jobs_validator: int | None = None, verbose: int | bool = False, verbose_validator: int | bool = False) ndarray | Tensor | Dict | Tuple[Hierarchical, ndarray, Tensor, Dict][source]#
Implements a shorthand for hierarchical scoring over all feature permutations in \(X\) describing \(y\).
This function acts as a shorthand for
Hierarchicalwhere it will automatically create and fit all permutations of the predictors specified in \(X\) following a hierarchical procedure. Returns either only the output scores or, ifreturn_hierarchicalisTrue, both the fitted hierarchical object and the scores in a tuple.For more information, please consult
Hierarchical.Warning
This performs \(k\left(2^p - 1\right)\) individual model fits where \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\), this becomes exponentially more expensive to solve. If you are interested in the unique contribution of each feature rather than separate estimates for all combinations, consider using
shapley_scoreinstead.- Parameters:
- modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator
The model to fit and score. Can be either a pipeline or estimator object.
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- groupsOptional[List | np.ndarray | torch.Tensor], default=None
Matrix describing all groups of interest of shape
(n_groups, n_predictors). IfNone, this will default to the identity matrix(n_predictors, n_predictors).- dimOptional[int], default=None
The dimension in \(X\) that describes the predictors. If
None, this will assume-1for 2D data and-2otherwise.- cvint | Any, default=5
The cross-validation procedure to follow. Either an object exposing a
split()method, such as :py:class`~mvpy.crossvalidation.KFold`, or an integer specifying the number of folds to use inKFold.- metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None
The metric to use for scoring. If
None, this will default to thescore()method exposed bymodel.- return_hierarchicalbool, default=True
Should the underlying
Hierarchicalobject be returned?- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- n_jobs_validatorOptional[int], default=None
How many jobs should be used to parallelise the cross-validation procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- verbose_validatorint | bool, default=False
Should progress in individual
Validatorobjects be reported verbosely?
- Returns:
- hierarchicalOptional[mvpy.model_selection.Hierarchical]
If
return_hierarchicalisTrue, the underlyingHierarchicalobject.- scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The all hierarchical scores of shape
(n_sets, n_cv[, ...])or a dictionary containing each individualMetric.
See also
mvpy.model_selection.shapley_score,mvpy.model_selection.ShapleyAn alternative scoring method computing unique contributions of each feature rather than the full permutation.
mvpy.model_selection.HierarchicalThe underlying hierarchical scoring object.
mvpy.crossvalidation.ValidatorThe cross-validation objects used in
Hierarchical.
Notes
Currently this does not automatically select the best model for you. Instead, it will return all scores, leaving further decisions up to you. This is because, for most applications, the scores of all permutations are actually of interest and may need to be reported.
Warning
If multiple values are supplied for
metric, this function will output a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.Warning
When specifying
n_jobshere, be careful not to specify any number of jobs in the model or anyn_jobs_validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.Examples
>>> import torch >>> from mvpy import metrics >>> from mvpy.dataset import make_meeg_continuous >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import TimeDelayed >>> from mvpy.model_selection import hierarchical_score >>> from sklearn.pipeline import make_pipeline >>> # create dataset >>> fs = 200 >>> X, y = make_meeg_continuous(fs = fs, n_features = 5) >>> # setup pipeline for estimation of multivariate temporal response functions >>> trf = make_pipeline( >>> Scaler().to_torch(), >>> TimeDelayed( >>> -1.0, 0.0, fs, >>> alphas = torch.logspace(-5, 5, 10, device = device) >>> ) >>> ) >>> # setup groups of predictors >>> groups = torch.tensor( >>> [ >>> [1, 1, 1, 0, 0], >>> [1, 1, 1, 1, 0], >>> [1, 1, 1, 0, 1] >>> ], >>> dtype = torch.long, >>> device = device >>> ) >>> # score predictors hierarchically >>> hierarchical, score = hierarchical_score( >>> trf, X, y, >>> groups = groups, >>> metric = (metrics.r2, metrics.pearsonr) >>> verbose = True >>> ) >>> score['r2'].shape torch.size([4, 5, 64, 400])
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional
mvpy.model_selection.shapley module#
- class mvpy.model_selection.shapley.Shapley(validator: Validator, n_permutations: int = 10, n_jobs: int | None = None, verbose: int | bool = False, on_underspecified: str = 'raise')[source]#
Bases:
objectImplements a Shapley value scoring procedure over all feature permutations in \(X\) describing \(y\).
When modeling outcomes \(y\), a common question to ask is to what degree individual predictors in \(X\) contribute to \(y\). To do this, we group predictors according to
groupsthat may, for example, specify:\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]Or, in other words, here we have three groups of predictors. To compute Shapley values, we always assume that group zero (in this case, including the first three predictors), is some baseline, relative to which we measure the contribution of other predictors. If no such baseline exists, it should simply be an intercept predictor. Next, we perform
n_permutationswhere we fit and score the zero model, then loop over a permutation of our other predictors that we add one by one, measuring how they affect the outcome score relative to our last fitted model.By repeating this procedure many times, we obtain Shapley values for each predictor that represent the fair contribution of each predictor to outcome scores, invariant of the order in which they may be included. The first score will always correspond to the full baseline performance, whereas the others are relative improvements over baseline.
Warning
This performs \(n k p\) model fits where \(n\) is the number of permutations, \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\) or \(n\), this becomes expensive to solve.
Warning
The default behaviour of this class is to check whether all predictors in \(X\) appear in the group specification
groupsat least once. If this is not the case, the class willraisean exception. If you would like to mutate this behaviour to either ignore or warn about these cases only, you may want to supply the correspondingon_underspecifiedvalue.Warning
When specifying
n_jobshere, be careful not to specify any number of jobs in the model or underlying validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.- Parameters:
- validatormvpy.crossvalidation.Validator
The validator object that should be used in this procedure.
- n_permutationsint, default=10
How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
- Attributes:
- validatormvpy.crossvalidation.Validator
The validator object that should be used in this procedure.
- n_permutationsint, default=10
How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
- validator_List[List[mvpy.crossvalidation.Validator]]
A list containing, per permutation, another list containing Validators for each model group, ordered by group identity.
- score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The shapley scores of shape
(n_permutations, n_sets, n_cv[, ...])or a dictionary containing each individualMetric.- order_np.ndarray | torch.Tensor
A matrix containing the order in which groups were added to the baseline group of shape
(n_permutations, n_groups - 1).
See also
mvpy.model_selection.hierarchical_score,mvpy.model_selection.HierarchicalAn alternative scoring method computing the full permutation over features.
mvpy.model_selection.shapley_scoreA shorthand for fitting this class.
mvpy.crossvalidation.ValidatorThe cross-validation object required by
Shapley.
Notes
All entries of scores are relative to the baseline group, except for, of course, the baseline group itself.
Warning
If multiple values are supplied for
metric, this class will produce a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.Examples
>>> import torch >>> from mvpy import metrics >>> from mvpy.dataset import make_meeg_continuous >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import TimeDelayed >>> from mvpy.crossvalidation import Validator >>> from mvpy.model_selection import Shapley >>> from sklearn.pipeline import make_pipeline >>> # create dataset >>> fs = 200 >>> X, y = make_meeg_continuous(fs = fs, n_features = 5) >>> # setup pipeline for estimation of multivariate temporal response functions >>> trf = make_pipeline( >>> Scaler().to_torch(), >>> TimeDelayed( >>> -1.0, 0.0, fs, >>> alphas = torch.logspace(-5, 5, 10, device = device) >>> ) >>> ) >>> # setup validator >>> validator = Validator( >>> trf, >>> metric = (metrics.r2, metrics.pearsonr), >>> ) >>> # setup groups >>> groups = torch.tensor( >>> [ >>> [1, 1, 1, 0, 0], >>> [0, 0, 0, 1, 0], >>> [0, 0, 0, 0, 1] >>> ], >>> dtype = torch.long >>> ) >>> # score individual predictors using Shapley >>> shapley = Shapley(validator, n_permutations = 3, verbose = True).fit( >>> X, y, >>> groups = groups >>> ) >>> shapley.score_['r2'].shape torch.Size([10, 3, 5, 64, 400])
- fit(X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None) Shapley[source]#
Fit the models to obtain Shapley values.
- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitray shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- groupsOptional[List | np.ndarray | torch.Tensor], default=None
Matrix describing all groups of interest of shape
(n_groups, n_predictors). IfNone, this will default to the identity matrix of(n_predictors, n_predictors).- dimOptional[int], default=None
The dimension in \(X\) that describes the predictors. If
None, this will assume-1for 2D data and-2otherwise.
- Returns:
- shapleymvpy.model_selection.Shapley
The fitted shapley model selector.
- mvpy.model_selection.shapley.fit_permutation_(validator: Validator, X: ndarray | Tensor, y: ndarray | Tensor, groups: ndarray | Tensor, dim: int, i: int) Dict[source]#
- mvpy.model_selection.shapley.fit_validator_(validator: Validator, X: ndarray | Tensor, y: ndarray | Tensor, mask: ndarray | Tensor, dim: int) Validator[source]#
Fit an individual validator object.
- Parameters:
- validatormvpy.crossvalidation.Validator
The validator object to be fit. Note that this is automatically cloned here.
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- masknp.ndarray | torch.Tensor
The mask to apply to the input data.
- dimint
The dimension along which to apply the mask.
- Returns:
- validatormvpy.crossvalidation.Validator
The fitted validator object.
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional
Python part of the warnings subsystem.
mvpy.model_selection.shapley_score module#
- mvpy.model_selection.shapley_score.shapley_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, return_shapley: bool = True, n_permutations: int = 10, n_jobs: int | None = None, n_jobs_validator: int | None = None, verbose: int | bool = False, verbose_validator: int | bool = False) ndarray | Tensor | Dict | Tuple[Shapley, ndarray, Tensor, Dict][source]#
Implements a shorthand for Shapley scoring over all feature permutations in \(X\) describing \(y\).
This function acts as a shorthand for
Shapleywhere it will automatically create and fit all groups of predictors specified in \(X\) following a shapley procedure. Returns either only the output scores or, ifreturn_shapleyisTrue, both the fitted shapley object and the scores in a tuple.For more information, please consult
Shapley.Warning
This performs \(n k p\) model fits where \(n\) is the number of permutations, \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\) or \(n\), this becomes expensive to solve.
- Parameters:
- modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator
The model to fit and score. Can be either a pipeline or estimator object.
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- groupsOptional[List | np.ndarray | torch.Tensor], default=None
Matrix describing all groups of interest of shape
(n_groups, n_predictors). IfNone, this will default to the identity matrix(n_predictors, n_predictors).- dimOptional[int], default=None
The dimension in \(X\) that describes the predictors. If
None, this will assume-1for 2D data and-2otherwise.- cvint | Any, default=5
The cross-validation procedure to follow. Either an object exposing a
split()method, such as :py:class`~mvpy.crossvalidation.KFold`, or an integer specifying the number of folds to use inKFold.- metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None
The metric to use for scoring. If
None, this will default to thescore()method exposed bymodel.- return_shapleybool, default=True
Should the underlying
Shapleyobject be returned?- n_permutationsint, default=10
How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- n_jobs_validatorOptional[int], default=None
How many jobs should be used to parallelise the cross-validation procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- verbose_validatorint | bool, default=False
Should progress in individual
Validatorobjects be reported verbosely?
- Returns:
- shapleyOptional[mvpy.model_selection.Shapley]
If
return_shapleyisTrue, the underlyingShapleyobject.- scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The all hierarchical scores of shape
(n_permutations, n_sets, n_cv[, ...])or a dictionary containing each individualMetric.
See also
mvpy.model_selection.hierarchical_score,mvpy.model_selection.HierarchicalAn alternative scoring method computing the full permutation of feature combinations.
mvpy.model_selection.ShapleyThe underlying shapley scoring object.
mvpy.crossvalidation.ValidatorThe cross-validation objects used in
Shapley.
Notes
All entries of scores are relative to the baseline group, except for, of course, the baseline group itself.
Warning
If multiple values are supplied for
metric, this class will produce a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.Warning
When specifying
n_jobshere, be careful not to specify any number of jobs in the model or anyn_jobs_validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.Examples
>>> import torch >>> from mvpy import metrics >>> from mvpy.dataset import make_meeg_continuous >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import TimeDelayed >>> from mvpy.model_selection import shapley_score >>> from sklearn.pipeline import make_pipeline >>> # create dataset >>> fs = 200 >>> X, y = make_meeg_continuous(fs = fs, n_features = 5) >>> # setup pipeline for estimation of multivariate temporal response functions >>> trf = make_pipeline( >>> Scaler().to_torch(), >>> TimeDelayed( >>> -1.0, 0.0, fs, >>> alphas = torch.logspace(-5, 5, 10, device = device) >>> ) >>> ) >>> # setup groups of predictors >>> groups = torch.tensor( >>> [ >>> [1, 1, 1, 0, 0], >>> [1, 1, 1, 1, 0], >>> [1, 1, 1, 0, 1] >>> ], >>> dtype = torch.long, >>> device = device >>> ) >>> # score predictors hierarchically >>> shapley, score = shapley_score( >>> trf, X, y, >>> groups = groups, >>> metric = (metrics.r2, metrics.pearsonr) >>> verbose = True >>> ) >>> score['r2'].shape torch.size([10, 4, 5, 64, 400])
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional