Hierarchical#
- class mvpy.model_selection.Hierarchical(validator: Validator, n_jobs: int | None = None, verbose: int | bool = False, on_underspecified: str = 'raise')[source]#
Implements a hierarchical scoring procedure over all feature permutations in \(X\) describing \(y\).
When modeling outcomes \(y\), a common question to ask is what specific combination of predictors in \(X\) explains the observed data best. One way to tackle this question is to iteratively cross-validate the scoring of predictions \(\hat{y}\) from each possible feature combination in \(X\). For example, if we have three features in \(X\), we would model y as a function of feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) such that we can now compare how well each individual predictor and combination of predictors explain the data.
This class implements precisely this hierarchical modeling procedure, but allows creation of groups of predictors. For example, we might have several predictors in \(X\) that, together, form some kind of baseline. We might then specify:
\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]to signal that there are three total groups of predictors that we want to permute together, the first one including predictors 1-3, and the following two including one novel predictor each.
Hierarchical
would now compute \(2^3 - 1\) combinations instead of the full \(2^6 - 1\) combinations. As described before, this yields the feature combinations \(\left[(0), (1), (2), (0, 1), (0, 2), (1, 2), (0, 1, 2)\right]\) where feature \((0,)\) groups predictors \(\{0, 1, 2\}\).Observe, however, that this now means that the permutations include those permutations where the baseline predictors are not included in all other models–for example, \((1,)\) which would evaluate to \([0, 0, 0, 1, 0]\). If we want to enforce that all models include the baseline, we should make them part of every other group:
\[\begin{split}G = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \end{bmatrix}\end{split}\]The backend will automatically remove duplicates, leaving us with only the contrasts that are of interest to us \(\left[(0,), (0, 1), (0, 2), (0, 1, 2)\right]\) or, expressed as boolean masks:
\[\begin{split}M = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix}\end{split}\]In code, you can confirm the desired grouping in
group_
, the resulting feature combinations inset_
and the masks that were applied inmask_
.Warning
This performs \(k\left(2^p - 1\right)\) individual model fits where \(k\) is the total number of cross-validation steps and \(p\) is the number of unique groups of predictors. For large \(p\), this becomes exponentially more expensive to solve. If you are interested in the unique contribution of each feature rather than separate estimates for all combinations, consider using
Shapley
instead.Warning
The default behaviour of this class is to check whether all predictors in \(X\) appear in the group specification
groups
at least once. If this is not the case, the class willraise
an exception. If you would like to mutate this behaviour to either ignore or warn about these cases only, you may want to supply the correspondingon_underspecified
value.Warning
When specifying
n_jobs
here, be careful not to specify any number of jobs in the model or underlying validator. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.- Parameters:
- validatormvpy.crossvalidation.Validator
The validator object that should be used in this procedure.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
- Attributes:
- validatormvpy.crossvalidation.Validator
The validator object that should be used in this procedure.
- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the hierarchical fitting procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- on_underspecified{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
If we detect an underspecified grouping–i.e., not all available predictors are used–to what level should we escalate things?
- validator_List[mvpy.crossvalidation.Validator]
A list of all fitted validators.
- score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The hierarchical scores of shape
(n_sets, n_cv[, ...])
or a dictionary containing each individualMetric
.- mask_np.ndarray | torch.Tensor
A matrix where each row corresponds to one boolean mask used to fit one validator.
- group_np.ndarray | torch.Tensor
A matrix where each row corresponds to the boolean mask for one group.
- group_id_: np.ndarray | torch.Tensor
A vector containing group identifiers used in sets.
- set_List[Tuple[int]]
A list including all group combinations that were tested.
See also
mvpy.model_selection.shapley_score
,mvpy.model_selection.Shapley
An alternative scoring method computing unique contributions of each feature rather than the full permutation.
mvpy.model_selection.hierarchical_score
A shorthand for fitting this class.
mvpy.crossvalidation.Validator
The cross-validation object required by
Hierarchical
.
Notes
Currently this does not automatically select the best model for you. Instead, it will return all scores, leaving further decisions up to you. This is because, for most applications, the scores of all permutations are actually of interest and may need to be reported.
Warning
If multiple values are supplied for
metric
, this class will produce a dictionary of{Metric.name: score, ...}
rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.Examples
>>> import torch >>> from mvpy import metrics >>> from mvpy.dataset import make_meeg_continuous >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import TimeDelayed >>> from mvpy.crossvalidation import Validator >>> from mvpy.model_selection import Hierarchical >>> from sklearn.pipeline import make_pipeline >>> # create dataset >>> fs = 200 >>> X, y = make_meeg_continuous(fs = fs, n_features = 5) >>> # setup pipeline for estimation of multivariate temporal response functions >>> trf = make_pipeline( >>> Scaler().to_torch(), >>> TimeDelayed( >>> -1.0, 0.0, fs, >>> alphas = torch.logspace(-5, 5, 10, device = device) >>> ) >>> ) >>> # setup validator >>> validator = Validator( >>> trf, >>> metric = (metrics.r2, metrics.pearsonr), >>> ) >>> # setup groups of predictors >>> groups = torch.tensor( >>> [ >>> [1, 1, 1, 0, 0], >>> [1, 1, 1, 1, 0], >>> [1, 1, 1, 0, 1] >>> ], >>> dtype = torch.long, >>> device = device >>> ) >>> # score predictors hierarchically >>> hierarchical = Hierarchical(validator, verbose = True).fit( >>> X, y, >>> groups = groups >>> ) >>> hierarchical.score_['r2'].shape torch.size([4, 5, 64, 400])
- fit(X: ndarray | Tensor, y: ndarray | Tensor, groups: List | ndarray | Tensor | None = None, dim: int | None = None) Hierarchical [source]#
Fit all models in a hierarchical manner.
- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitray shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- groupsOptional[List | np.ndarray | torch.Tensor], default=None
Matrix describing all groups of interest of shape
(n_groups, n_predictors)
. IfNone
, this will default to the identity matrix of(n_predictors, n_predictors)
.- dimOptional[int], default=None
The dimension in \(X\) that describes the predictors. If
None
, this will assume-1
for 2D data and-2
otherwise.
- Returns:
- hierarchicalmvpy.model_selection.Hierarchical
The fitted hierarchical model selector.