Hierarchical#

class mvpy.model_selection.Hierarchical(validator: Validator, n_jobs: int | None = None, verbose: int | bool = False, on_underspecified: str = 'raise')[source]#

Implements a hierarchical scoring procedure over all feature permutations in \(X\) describing \(y\).

When modeling outcomes \(y\), a common question to ask is what specific combination of predictors in \(X\) explains the observed data best. One way to tackle this question is to iteratively cross-validate the scoring of predictions \(\hat{y}\) from each possible feature combination in \(X\). For example, if we have three features in \(X\), we would model y as a function of feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) such that we can now compare how well each individual predictor and combination of predictors explain the data.

This class implements precisely this hierarchical modeling procedure, but allows creation of groups of predictors. For example, we might have several predictors in \(X\) that, together, form some kind of baseline. We might then specify:

\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

to signal that there are three total groups of predictors that we want to permute together, the first one including predictors 1-3, and the following two including one novel predictor each. Hierarchical would now compute \(2^3 - 1\) combinations instead of the full \(2^6 - 1\) combinations. As described before, this yields the feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) where feature \((0,)\) groups predictors \(\{0, 1, 2\}\).

Observe, however, that this now means that the permutations include those permutations where the baseline predictors are not included in all other models–for example, \((1,)\) which would evaluate to \([0, 0, 0, 1, 0]\). If we want to enforce that all models include the baseline, we should make them part of every other group:

\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \end{bmatrix}\end{split}\]

The backend will automatically remove duplicates, leaving us with only the contrasts that are of interest to us \(\left[(0,), (0, 1), (0, 2), (0, 1, 2)\right]\) or, expressed as boolean masks:

\[\begin{split}M = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix}\end{split}\]

In code, you can confirm the desired grouping in group_, the resulting feature combinations in set_ and the masks that were applied in mask_.

Warning

This performs \(k\left(2^p - 1\right)\) individual model fits where \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\), this becomes exponentially more expensive to solve. If you are interested in the unique contribution of each feature rather than separate estimates for all combinations, consider using Shapley instead.

Warning

The default behaviour of this class is to check whether all predictors in \(X\) appear in the group specification groups at least once. If this is not the case, the class will raise an exception. If you would like to mutate this behaviour to either ignore or warn about these cases only, you may want to supply the corresponding on_underspecified value.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model or underlying validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Parameters:

validatormvpy.crossvalidation.Validator: The validator object that should be used in this procedure.
n_jobsOptional[int], default=None: How many jobs should be used to parallelise the hierarchical fitting procedure?
verboseint | bool, default=False: Should progress be reported verbosely?
on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’: If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?

Attributes:

validatormvpy.crossvalidation.Validator: The validator object that should be used in this procedure.
n_jobsOptional[int], default=None: How many jobs should be used to parallelise the hierarchical fitting procedure?
verboseint | bool, default=False: Should progress be reported verbosely?
on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’: If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
validator_List[mvpy.crossvalidation.Validator]: A list of all fitted validators.
score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]: The hierarchical scores of shape (n_sets, n_cv[, ...]) or a dictionary containing each individual Metric.
mask_np.ndarray | torch.Tensor: A matrix where each row corresponds to one boolean mask used to fit one validator.
group_np.ndarray | torch.Tensor: A matrix where each row corresponds to the boolean mask for one group.
group_id_: np.ndarray | torch.Tensor: A vector containing group identifiers used in sets.
set_List[Tuple[int]]: A list including all group combinations that were tested.

See also

mvpy.model_selection.shapley_score, mvpy.model_selection.Shapley: An alternative scoring method computing unique contributions of each feature rather than the full permutation.
mvpy.model_selection.hierarchical_score: A shorthand for fitting this class.
mvpy.crossvalidation.Validator: The cross-validation object required by Hierarchical.

Notes

Currently this does not automatically select the best model for you. Instead, it will return all scores, leaving further decisions up to you. This is because, for most applications, the scores of all permutations are actually of interest and may need to be reported.

Warning

If multiple values are supplied for metric, this class will produce a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Examples

>>> import torch
>>> from mvpy import metrics
>>> from mvpy.dataset import make_meeg_continuous
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import TimeDelayed
>>> from mvpy.crossvalidation import Validator
>>> from mvpy.model_selection import Hierarchical
>>> from sklearn.pipeline import make_pipeline
>>> # create dataset
>>> fs = 200
>>> X, y = make_meeg_continuous(fs = fs, n_features = 5)
>>> # setup pipeline for estimation of multivariate temporal response functions
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     TimeDelayed(
>>>         -1.0, 0.0, fs, 
>>>         alphas = torch.logspace(-5, 5, 10, device = device)
>>>     )
>>> )
>>> # setup validator
>>> validator = Validator(
>>>     trf,
>>>     metric = (metrics.r2, metrics.pearsonr),
>>> )
>>> # setup groups of predictors
>>> groups = torch.tensor(
>>>     [
>>>         [1, 1, 1, 0, 0],
>>>         [1, 1, 1, 1, 0],
>>>         [1, 1, 1, 0, 1]
>>>     ], 
>>>     dtype = torch.long,
>>>     device = device
>>> )
>>> # score predictors hierarchically
>>> hierarchical = Hierarchical(validator, verbose = True).fit(
>>>     X, y,
>>>     groups = groups
>>> )
>>> hierarchical.score_['r2'].shape
torch.size([4, 5, 64, 400])

Fit all models in a hierarchical manner.

Parameters:

Xnp.ndarray | torch.Tensor: The input data of arbitray shape.
ynp.ndarray | torch.Tensor: The output data of arbitrary shape.
groupsOptional[List | np.ndarray | torch.Tensor], default=None: Matrix describing all groups of interest of shape (n_groups, n_predictors). If None, this will default to the identity matrix of (n_predictors, n_predictors).
dimOptional[int], default=None: The dimension in \(X\) that describes the predictors. If None, this will assume -1 for 2D data and -2 otherwise.

Returns:

hierarchicalmvpy.model_selection.Hierarchical: The fitted hierarchical model selector.

Hierarchical#

This Page