mvpy.model_selection package#

Submodules#

mvpy.model_selection.hierarchical module#

class mvpy.model_selection.hierarchical.Hierarchical(validator: Validator, n_jobs: int | None = None, verbose: int | bool = False, on_underspecified: str = 'raise')[source]#

Bases: object

Implements a hierarchical scoring procedure over all feature permutations in \(X\) describing \(y\).

When modeling outcomes \(y\), a common question to ask is what specific combination of predictors in \(X\) explains the observed data best. One way to tackle this question is to iteratively cross-validate the scoring of predictions \(\hat{y}\) from each possible feature combination in \(X\). For example, if we have three features in \(X\), we would model y as a function of feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) such that we can now compare how well each individual predictor and combination of predictors explain the data.

This class implements precisely this hierarchical modeling procedure, but allows creation of groups of predictors. For example, we might have several predictors in \(X\) that, together, form some kind of baseline. We might then specify:

\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

to signal that there are three total groups of predictors that we want to permute together, the first one including predictors 1-3, and the following two including one novel predictor each. Hierarchical would now compute \(2^3 - 1\) combinations instead of the full \(2^6 - 1\) combinations. As described before, this yields the feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) where feature \((0,)\) groups predictors \(\{0, 1, 2\}\).

Observe, however, that this now means that the permutations include those permutations where the baseline predictors are not included in all other models–for example, \((1,)\) which would evaluate to \([0, 0, 0, 1, 0]\). If we want to enforce that all models include the baseline, we should make them part of every other group:

\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \end{bmatrix}\end{split}\]

The backend will automatically remove duplicates, leaving us with only the contrasts that are of interest to us \(\left[(0,), (0, 1), (0, 2), (0, 1, 2)\right]\) or, expressed as boolean masks:

\[\begin{split}M = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix}\end{split}\]

In code, you can confirm the desired grouping in group_, the resulting feature combinations in set_ and the masks that were applied in mask_.

Warning

This performs \(k\left(2^p - 1\right)\) individual model fits where \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\), this becomes exponentially more expensive to solve. If you are interested in the unique contribution of each feature rather than separate estimates for all combinations, consider using Shapley instead.

Warning

The default behaviour of this class is to check whether all predictors in \(X\) appear in the group specification groups at least once. If this is not the case, the class will raise an exception. If you would like to mutate this behaviour to either ignore or warn about these cases only, you may want to supply the corresponding on_underspecified value.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model or underlying validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Parameters:
validatormvpy.crossvalidation.Validator

The validator object that should be used in this procedure.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’

If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?

Attributes:
validatormvpy.crossvalidation.Validator

The validator object that should be used in this procedure.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’

If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?

validator_List[mvpy.crossvalidation.Validator]

A list of all fitted validators.

score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The hierarchical scores of shape (n_sets, n_cv[, ...]) or a dictionary containing each individual Metric.

mask_np.ndarray | torch.Tensor

A matrix where each row corresponds to one boolean mask used to fit one validator.

group_np.ndarray | torch.Tensor

A matrix where each row corresponds to the boolean mask for one group.

group_id_: np.ndarray | torch.Tensor

A vector containing group identifiers used in sets.

set_List[Tuple[int]]

A list including all group combinations that were tested.

See also

mvpy.model_selection.shapley_score, mvpy.model_selection.Shapley

An alternative scoring method computing unique contributions of each feature rather than the full permutation.

mvpy.model_selection.hierarchical_score

A shorthand for fitting this class.

mvpy.crossvalidation.Validator

The cross-validation object required by Hierarchical.

Notes

Currently this does not automatically select the best model for you. Instead, it will return all scores, leaving further decisions up to you. This is because, for most applications, the scores of all permutations are actually of interest and may need to be reported.

Warning

If multiple values are supplied for metric, this class will produce a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Examples

>>> import torch
>>> from mvpy import metrics
>>> from mvpy.dataset import make_meeg_continuous
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import TimeDelayed
>>> from mvpy.crossvalidation import Validator
>>> from mvpy.model_selection import Hierarchical
>>> from sklearn.pipeline import make_pipeline
>>> # create dataset
>>> fs = 200
>>> X, y = make_meeg_continuous(fs = fs, n_features = 5)
>>> # setup pipeline for estimation of multivariate temporal response functions
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     TimeDelayed(
>>>         -1.0, 0.0, fs, 
>>>         alphas = torch.logspace(-5, 5, 10, device = device)
>>>     )
>>> )
>>> # setup validator
>>> validator = Validator(
>>>     trf,
>>>     metric = (metrics.r2, metrics.pearsonr),
>>> )
>>> # setup groups of predictors
>>> groups = torch.tensor(
>>>     [
>>>         [1, 1, 1, 0, 0],
>>>         [1, 1, 1, 1, 0],
>>>         [1, 1, 1, 0, 1]
>>>     ], 
>>>     dtype = torch.long,
>>>     device = device
>>> )
>>> # score predictors hierarchically
>>> hierarchical = Hierarchical(validator, verbose = True).fit(
>>>     X, y,
>>>     groups = groups
>>> )
>>> hierarchical.score_['r2'].shape
torch.size([4, 5, 64, 400])
fit(X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None) Hierarchical[source]#

Fit all models in a hierarchical manner.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitray shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

groupsOptional[List | np.ndarray | torch.Tensor], default=None

Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix of (n_predictors, n_predictors).

dimOptional[int], default=None

The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

Returns:
hierarchicalmvpy.model_selection.Hierarchical

The fitted hierarchical model selector.

mvpy.model_selection.hierarchical.check_dims_and_groups_(X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, on_underspecified: str = 'raise') Tuple[int, ndarray | Tensor, List, ndarray | Tensor][source]#

Check dimension and groups and create corresponding sets and masks.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitray shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

groupsOptional[List | np.ndarray | torch.Tensor], default=None

Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix of (n_predictors, n_predictors).

dimOptional[int], default=None

The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’

If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?

Returns:
dimint

The dimension along which groups exist.

group_idsnp.ndarray | torch.Tensor

Vector containing group identifiers.

setsList[Tuple[int]]

List containing tuples of group identifiers, together forming all combinations.

masksnp.ndarray | torch.Tensor

The masks corresponding to each feature combination in sets.

mvpy.model_selection.hierarchical.fit_validator_(validator: Validator, X: ndarray | Tensor, y: ndarray | Tensor, mask: ndarray | Tensor, dim: int, i: int) Dict[source]#

Fit an individual validator object.

Parameters:
validatormvpy.crossvalidation.Validator

The validator object to be fit. Note that this is automatically cloned here.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

masknp.ndarray | torch.Tensor

The mask to apply to the input data.

dimint

The dimension along which to apply the mask.

iint

The identifier to return.

Returns:
resultsDict
A dictionary containing the results:
iint

The identifier for this validator.

validatormvpy.crossvalidation.Validator

The fitted validator.

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The score of the validator.

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

Python part of the warnings subsystem.

mvpy.model_selection.hierarchical_score module#

mvpy.model_selection.hierarchical_score.hierarchical_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, return_hierarchical: bool = True, n_jobs: int | None = None, n_jobs_validator: int | None = None, verbose: int | bool = False, verbose_validator: int | bool = False) ndarray | Tensor | Dict | Tuple[Hierarchical, ndarray, Tensor, Dict][source]#

Implements a shorthand for hierarchical scoring over all feature permutations in \(X\) describing \(y\).

This function acts as a shorthand for Hierarchical where it will automatically create and fit all permutations of the predictors specified in \(X\) following a hierarchical procedure. Returns either only the output scores or, if return_hierarchical is True, both the fitted hierarchical object and the scores in a tuple.

For more information, please consult Hierarchical.

Warning

This performs \(k\left(2^p - 1\right)\) individual model fits where \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\), this becomes exponentially more expensive to solve. If you are interested in the unique contribution of each feature rather than separate estimates for all combinations, consider using shapley_score instead.

Parameters:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit and score. Can be either a pipeline or estimator object.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

groupsOptional[List | np.ndarray | torch.Tensor], default=None

Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix (n_predictors, n_predictors).

dimOptional[int], default=None

The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

cvint | Any, default=5

The cross-validation procedure to follow. Either an object exposing a split() method, such as :py:class`~mvpy.crossvalidation.KFold`, or an integer specifying the number of folds to use in KFold.

metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

return_hierarchicalbool, default=True

Should the underlying Hierarchical object be returned?

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

n_jobs_validatorOptional[int], default=None

How many jobs should be used to parallelise the cross-validation procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

verbose_validatorint | bool, default=False

Should progress in individual Validator objects be reported verbosely?

Returns:
hierarchicalOptional[mvpy.model_selection.Hierarchical]

If return_hierarchical is True, the underlying Hierarchical object.

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The all hierarchical scores of shape (n_sets, n_cv[, ...]) or a dictionary containing each individual Metric.

See also

mvpy.model_selection.shapley_score, mvpy.model_selection.Shapley

An alternative scoring method computing unique contributions of each feature rather than the full permutation.

mvpy.model_selection.Hierarchical

The underlying hierarchical scoring object.

mvpy.crossvalidation.Validator

The cross-validation objects used in Hierarchical.

Notes

Currently this does not automatically select the best model for you. Instead, it will return all scores, leaving further decisions up to you. This is because, for most applications, the scores of all permutations are actually of interest and may need to be reported.

Warning

If multiple values are supplied for metric, this function will output a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model or any n_jobs_validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Examples

>>> import torch
>>> from mvpy import metrics
>>> from mvpy.dataset import make_meeg_continuous
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import TimeDelayed
>>> from mvpy.model_selection import hierarchical_score
>>> from sklearn.pipeline import make_pipeline
>>> # create dataset
>>> fs = 200
>>> X, y = make_meeg_continuous(fs = fs, n_features = 5)
>>> # setup pipeline for estimation of multivariate temporal response functions
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     TimeDelayed(
>>>         -1.0, 0.0, fs, 
>>>         alphas = torch.logspace(-5, 5, 10, device = device)
>>>     )
>>> )
>>> # setup groups of predictors
>>> groups = torch.tensor(
>>>     [
>>>         [1, 1, 1, 0, 0],
>>>         [1, 1, 1, 1, 0],
>>>         [1, 1, 1, 0, 1]
>>>     ], 
>>>     dtype = torch.long,
>>>     device = device
>>> )
>>> # score predictors hierarchically
>>> hierarchical, score = hierarchical_score(
>>>     trf, X, y, 
>>>     groups = groups,
>>>     metric = (metrics.r2, metrics.pearsonr)
>>>     verbose = True
>>> )
>>> score['r2'].shape
torch.size([4, 5, 64, 400])

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

mvpy.model_selection.shapley module#

class mvpy.model_selection.shapley.Shapley(validator: Validator, n_permutations: int = 10, n_jobs: int | None = None, verbose: int | bool = False, on_underspecified: str = 'raise')[source]#

Bases: object

Implements a Shapley value scoring procedure over all feature permutations in \(X\) describing \(y\).

When modeling outcomes \(y\), a common question to ask is to what degree individual predictors in \(X\) contribute to \(y\). To do this, we group predictors according to groups that may, for example, specify:

\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

Or, in other words, here we have three groups of predictors. To compute Shapley values, we always assume that group zero (in this case, including the first three predictors), is some baseline, relative to which we measure the contribution of other predictors. If no such baseline exists, it should simply be an intercept predictor. Next, we perform n_permutations where we fit and score the zero model, then loop over a permutation of our other predictors that we add one by one, measuring how they affect the outcome score relative to our last fitted model.

By repeating this procedure many times, we obtain Shapley values for each predictor that represent the fair contribution of each predictor to outcome scores, invariant of the order in which they may be included. The first score will always correspond to the full baseline performance, whereas the others are relative improvements over baseline.

Warning

This performs \(n k p\) model fits where \(n\) is the number of permutations, \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\) or \(n\), this becomes expensive to solve.

Warning

The default behaviour of this class is to check whether all predictors in \(X\) appear in the group specification groups at least once. If this is not the case, the class will raise an exception. If you would like to mutate this behaviour to either ignore or warn about these cases only, you may want to supply the corresponding on_underspecified value.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model or underlying validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Parameters:
validatormvpy.crossvalidation.Validator

The validator object that should be used in this procedure.

n_permutationsint, default=10

How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’

If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?

Attributes:
validatormvpy.crossvalidation.Validator

The validator object that should be used in this procedure.

n_permutationsint, default=10

How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’

If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?

validator_List[List[mvpy.crossvalidation.Validator]]

A list containing, per permutation, another list containing Validators for each model group, ordered by group identity.

score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The shapley scores of shape (n_permutations, n_sets, n_cv[, ...]) or a dictionary containing each individual Metric.

order_np.ndarray | torch.Tensor

A matrix containing the order in which groups were added to the baseline group of shape (n_permutations, n_groups - 1).

See also

mvpy.model_selection.hierarchical_score, mvpy.model_selection.Hierarchical

An alternative scoring method computing the full permutation over features.

mvpy.model_selection.shapley_score

A shorthand for fitting this class.

mvpy.crossvalidation.Validator

The cross-validation object required by Shapley.

Notes

All entries of scores are relative to the baseline group, except for, of course, the baseline group itself.

Warning

If multiple values are supplied for metric, this class will produce a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Examples

>>> import torch
>>> from mvpy import metrics
>>> from mvpy.dataset import make_meeg_continuous
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import TimeDelayed
>>> from mvpy.crossvalidation import Validator
>>> from mvpy.model_selection import Shapley
>>> from sklearn.pipeline import make_pipeline
>>> # create dataset
>>> fs = 200
>>> X, y = make_meeg_continuous(fs = fs, n_features = 5)
>>> # setup pipeline for estimation of multivariate temporal response functions
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     TimeDelayed(
>>>         -1.0, 0.0, fs, 
>>>         alphas = torch.logspace(-5, 5, 10, device = device)
>>>     )
>>> )
>>> # setup validator
>>> validator = Validator(
>>>     trf,
>>>     metric = (metrics.r2, metrics.pearsonr),
>>> )
>>> # setup groups
>>> groups = torch.tensor(
>>>     [
>>>         [1, 1, 1, 0, 0],
>>>         [0, 0, 0, 1, 0],
>>>         [0, 0, 0, 0, 1]
>>>     ], 
>>>     dtype = torch.long
>>> )
>>> # score individual predictors using Shapley
>>> shapley = Shapley(validator, n_permutations = 3, verbose = True).fit(
>>>     X, y,
>>>     groups = groups
>>> )
>>> shapley.score_['r2'].shape
torch.Size([10, 3, 5, 64, 400])
fit(X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None) Shapley[source]#

Fit the models to obtain Shapley values.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitray shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

groupsOptional[List | np.ndarray | torch.Tensor], default=None

Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix of (n_predictors, n_predictors).

dimOptional[int], default=None

The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

Returns:
shapleymvpy.model_selection.Shapley

The fitted shapley model selector.

mvpy.model_selection.shapley.fit_permutation_(validator: Validator, X: ndarray | Tensor, y: ndarray | Tensor, groups: ndarray | Tensor, dim: int, i: int) Dict[source]#
mvpy.model_selection.shapley.fit_validator_(validator: Validator, X: ndarray | Tensor, y: ndarray | Tensor, mask: ndarray | Tensor, dim: int) Validator[source]#

Fit an individual validator object.

Parameters:
validatormvpy.crossvalidation.Validator

The validator object to be fit. Note that this is automatically cloned here.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

masknp.ndarray | torch.Tensor

The mask to apply to the input data.

dimint

The dimension along which to apply the mask.

Returns:
validatormvpy.crossvalidation.Validator

The fitted validator object.

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

Python part of the warnings subsystem.

mvpy.model_selection.shapley_score module#

mvpy.model_selection.shapley_score.shapley_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, return_shapley: bool = True, n_permutations: int = 10, n_jobs: int | None = None, n_jobs_validator: int | None = None, verbose: int | bool = False, verbose_validator: int | bool = False) ndarray | Tensor | Dict | Tuple[Shapley, ndarray, Tensor, Dict][source]#

Implements a shorthand for Shapley scoring over all feature permutations in \(X\) describing \(y\).

This function acts as a shorthand for Shapley where it will automatically create and fit all groups of predictors specified in \(X\) following a shapley procedure. Returns either only the output scores or, if return_shapley is True, both the fitted shapley object and the scores in a tuple.

For more information, please consult Shapley.

Warning

This performs \(n k p\) model fits where \(n\) is the number of permutations, \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\) or \(n\), this becomes expensive to solve.

Parameters:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit and score. Can be either a pipeline or estimator object.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

groupsOptional[List | np.ndarray | torch.Tensor], default=None

Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix (n_predictors, n_predictors).

dimOptional[int], default=None

The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

cvint | Any, default=5

The cross-validation procedure to follow. Either an object exposing a split() method, such as :py:class`~mvpy.crossvalidation.KFold`, or an integer specifying the number of folds to use in KFold.

metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

return_shapleybool, default=True

Should the underlying Shapley object be returned?

n_permutationsint, default=10

How many permutations should we run? A higher number of permutations yields better estimates. Generally, the higher the number of predictor groups, the higher the number of permutations used.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

n_jobs_validatorOptional[int], default=None

How many jobs should be used to parallelise the cross-validation procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

verbose_validatorint | bool, default=False

Should progress in individual Validator objects be reported verbosely?

Returns:
shapleyOptional[mvpy.model_selection.Shapley]

If return_shapley is True, the underlying Shapley object.

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The all hierarchical scores of shape (n_permutations, n_sets, n_cv[, ...]) or a dictionary containing each individual Metric.

See also

mvpy.model_selection.hierarchical_score, mvpy.model_selection.Hierarchical

An alternative scoring method computing the full permutation of feature combinations.

mvpy.model_selection.Shapley

The underlying shapley scoring object.

mvpy.crossvalidation.Validator

The cross-validation objects used in Shapley.

Notes

All entries of scores are relative to the baseline group, except for, of course, the baseline group itself.

Warning

If multiple values are supplied for metric, this class will produce a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model or any n_jobs_validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Examples

>>> import torch
>>> from mvpy import metrics
>>> from mvpy.dataset import make_meeg_continuous
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import TimeDelayed
>>> from mvpy.model_selection import shapley_score
>>> from sklearn.pipeline import make_pipeline
>>> # create dataset
>>> fs = 200
>>> X, y = make_meeg_continuous(fs = fs, n_features = 5)
>>> # setup pipeline for estimation of multivariate temporal response functions
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     TimeDelayed(
>>>         -1.0, 0.0, fs, 
>>>         alphas = torch.logspace(-5, 5, 10, device = device)
>>>     )
>>> )
>>> # setup groups of predictors
>>> groups = torch.tensor(
>>>     [
>>>         [1, 1, 1, 0, 0],
>>>         [1, 1, 1, 1, 0],
>>>         [1, 1, 1, 0, 1]
>>>     ], 
>>>     dtype = torch.long,
>>>     device = device
>>> )
>>> # score predictors hierarchically
>>> shapley, score = shapley_score(
>>>     trf, X, y, 
>>>     groups = groups,
>>>     metric = (metrics.r2, metrics.pearsonr)
>>>     verbose = True
>>> )
>>> score['r2'].shape
torch.size([10, 4, 5, 64, 400])

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

Module contents#