hierarchical_score#

mvpy.model_selection.hierarchical_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, return_hierarchical: bool = True, n_jobs: int | None = None, n_jobs_validator: int | None = None, verbose: int | bool = False, verbose_validator: int | bool = False) ndarray | Tensor | Dict | Tuple[Hierarchical, ndarray, Tensor, Dict][source]#

Implements a shorthand for hierarchical scoring over all feature permutations in \(X\) describing \(y\).

This function acts as a shorthand for Hierarchical where it will automatically create and fit all permutations of the predictors specified in \(X\) following a hierarchical procedure. Returns either only the output scores or, if return_hierarchical is True, both the fitted hierarchical object and the scores in a tuple.

For more information, please consult Hierarchical.

Warning

This performs \(k\left(2^p - 1\right)\) individual model fits where \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\), this becomes exponentially more expensive to solve. If you are interested in the unique contribution of each feature rather than separate estimates for all combinations, consider using shapley_score instead.

Parameters:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit and score. Can be either a pipeline or estimator object.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

groupsOptional[List | np.ndarray | torch.Tensor], default=None

Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix (n_predictors, n_predictors).

dimOptional[int], default=None

The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

cvint | Any, default=5

The cross-validation procedure to follow. Either an object exposing a split() method, such as :py:class`~mvpy.crossvalidation.KFold`, or an integer specifying the number of folds to use in KFold.

metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

return_hierarchicalbool, default=True

Should the underlying Hierarchical object be returned?

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the hierarchical fitting procedure?

n_jobs_validatorOptional[int], default=None

How many jobs should be used to parallelise the cross-validation procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

verbose_validatorint | bool, default=False

Should progress in individual Validator objects be reported verbosely?

Returns:
hierarchicalOptional[mvpy.model_selection.Hierarchical]

If return_hierarchical is True, the underlying Hierarchical object.

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The all hierarchical scores of shape (n_sets, n_cv[, ...]) or a dictionary containing each individual Metric.

See also

mvpy.model_selection.shapley_score, mvpy.model_selection.Shapley

An alternative scoring method computing unique contributions of each feature rather than the full permutation.

mvpy.model_selection.Hierarchical

The underlying hierarchical scoring object.

mvpy.crossvalidation.Validator

The cross-validation objects used in Hierarchical.

Notes

Currently this does not automatically select the best model for you. Instead, it will return all scores, leaving further decisions up to you. This is because, for most applications, the scores of all permutations are actually of interest and may need to be reported.

Warning

If multiple values are supplied for metric, this function will output a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model or any n_jobs_validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Examples

>>> import torch
>>> from mvpy import metrics
>>> from mvpy.dataset import make_meeg_continuous
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import TimeDelayed
>>> from mvpy.model_selection import hierarchical_score
>>> from sklearn.pipeline import make_pipeline
>>> # create dataset
>>> fs = 200
>>> X, y = make_meeg_continuous(fs = fs, n_features = 5)
>>> # setup pipeline for estimation of multivariate temporal response functions
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     TimeDelayed(
>>>         -1.0, 0.0, fs, 
>>>         alphas = torch.logspace(-5, 5, 10, device = device)
>>>     )
>>> )
>>> # setup groups of predictors
>>> groups = torch.tensor(
>>>     [
>>>         [1, 1, 1, 0, 0],
>>>         [1, 1, 1, 1, 0],
>>>         [1, 1, 1, 0, 1]
>>>     ], 
>>>     dtype = torch.long,
>>>     device = device
>>> )
>>> # score predictors hierarchically
>>> hierarchical, score = hierarchical_score(
>>>     trf, X, y, 
>>>     groups = groups,
>>>     metric = (metrics.r2, metrics.pearsonr)
>>>     verbose = True
>>> )
>>> score['r2'].shape
torch.size([4, 5, 64, 400])