Validator#
- class mvpy.crossvalidation.Validator(model: BaseEstimator | Pipeline, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, n_jobs: int | None = None, verbose: int | bool = False)[source]#
Implements automated cross-validation and scoring over estimators or pipelines.
This allows for easy cross-validated evaluation of models or pipelines, without having to explicitly write the code therefore–a common source of mistakes–and while still having access to the underlying fitted pipelines. This allows not only model evaluation but also evaluation on, for example, new unseen data or inspection of model parameters.
- Parameters:
- modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator
The model to fit and score. Can be either a pipeline or estimator object.
- cvint | Any, default=5
The cross-validation procedure to follow. Either an object exposing a
split()method, such asKFold, or an integer specifying the number of folds to use inKFold.- metricOptional[mvpy.metrics.Metric | Tuple[mvpy.metrics.Metric]], default=None
The metric to use for scoring. If
None, this will default to thescore()method exposed bymodel.- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the cross-validation procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- Attributes:
- modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator
The model to fit and score. Can be either a pipeline or estimator object.
- cvint | Any, default=5
The cross-validation procedure to follow. Either an object exposing a
split()method, such asKFold, or an integer specifying the number of folds to use inKFold.- metricOptional[mvpy.metrics.Metric | Tuple[mvpy.metrics.Metric]], default=None
The metric to use for scoring. If
None, this will default to thescore()method exposed bymodel.- n_jobsOptional[int], default=None
How many jobs should be used to parallelise the cross-validation procedure?
- verboseint | bool, default=False
Should progress be reported verbosely?
- cv_Any
The instantiated cross-validation object exposing
split().- cv_n_int
The number of cross-validation steps used by
cv_.- model_List[sklearn.pipeline.Pipeline | sklearn.base.BaseEstimator]
The models fit during cross-validation.
- score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The scores of all models on test data of an arbitrary output shape.
- test_List[np.ndarray | torch.Tensor]
A list of test indices used for scoring.
See also
mvpy.crossvalidation.cross_val_scoreA shorthand for fitting a
Validator.
Notes
When trying to access individual functions or attributes within estimators of a pipeline, make sure to indicate the pipeline step in
from_stepwhen calling eithercall()orcollect().Warning
If multiple values are supplied for
metric, this function will output a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.Warning
When specifying
n_jobshere, be careful not to specify any number of jobs in the model. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.Examples
In the simplest case where we have one estimator, we can do:
>>> import torch >>> from mvpy.estimators import ReceptiveField >>> from mvpy.crossvalidation import Validator >>> ß = torch.tensor([1., 2., 3., 2., 1.]) >>> X = torch.normal(0, 1, (100, 1, 50)) >>> y = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same') >>> y = y + torch.normal(0, 1, y.shape) >>> trf = ReceptiveField(-2, 2, 1, alpha = 1e-5) >>> validator = Validator(model = trf).fit(X, y) >>> validator.score_.shape torch.size([5, 1, 50]) >>> validator.collect('coef_').shape torch.size([5, 1, 1, 5]) >>> X_new = torch.normal(0, 1, (100, 1, 50)) >>> y_new = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same') >>> y_new = y_new + torch.normal(0, 1, y_new.shape) >>> validator.score(X_new, y_new).shape_ torch.size([5, 1, 50])
If we have a pipeline, we may want to do:
>>> import torch >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import ReceptiveField >>> from mvpy.crossvalidation import Validator >>> from sklearn.pipeline import make_pipeline >>> ß = torch.tensor([1., 2., 3., 2., 1.]) >>> X = torch.normal(0, 1, (100, 1, 50)) >>> y = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same') >>> y = y + torch.normal(0, 1, y.shape) >>> trf = make_pipeline( >>> Scaler().to_torch(), >>> ReceptiveField(-2, 2, 1, alpha = 1e-5) >>> ) >>> validator = Validator(model = trf).fit(X, y) >>> validator.score_.shape torch.size([5, 1, 50]) >>> validator.collect('coef_', from_step = -1).shape torch.size([5, 1, 1, 5]) >>> X_new = torch.normal(0, 1, (100, 1, 50)) >>> y_new = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same') >>> y_new = y_new + torch.normal(0, 1, y_new.shape) >>> validator.score(X_new, y_new).shape_ torch.size([5, 1, 50])
Whereas, in the more complicated case of also using
Sliding, for example, we can do:>>> import torch >>> from mvpy.dataset import make_meeg_categorical >>> from mvpy.preprocessing import Scaler >>> from mvpy.estimators import Sliding, RidgeClassifier >>> from mvpy.crossvalidation import Validator >>> from mvpy import metrics >>> from sklearn.pipeline import make_pipeline >>> y, X = make_meeg_categorical() >>> clf = make_pipeline( >>> Scaler().to_torch(), >>> Sliding( >>> RidgeClassifier( >>> torch.logspace(-5, 5, 10) >>> ), >>> dims = (-1,), >>> verbose = True, >>> n_jobs = None >>> ) >>> ) >>> validator = Validator( >>> model = clf, >>> metric = (metrics.accuracy, metrics.roc_auc), >>> verbose = True >>> ).fit(X, y) >>> validator.score_['roc_auc'].shape torch.Size([5, 1, 400]) >>> validator.call('collect', 'pattern_', from_step = -1).shape torch.Size([5, 400, 64, 2])
- call(method: str, *args: Any, from_step: int | None = None, **kwargs: Any) ndarray | Tensor[source]#
Call
methodfor all fitted estimators, pipelines or specific estimators within a pipeline.- Parameters:
- methodstr
The method to call.
- *argsany
Additional arguments to pass to the method.
- from_stepOptional[int], default=None
If not
Noneand model is a pipeline, which estimator within that pipeline should the method be called from?- **kwargsAny
Additional keyword arguments to pass to the method.
- Returns:
- outnp.ndarray | torch.Tensor
Stacked outputs from method call of shape
(cv_n_[, ...]).
- clone() Validator[source]#
Obtain a clone of this validator.
- Returns:
- validatormvpy.crossvalidation.Validator
The cloned object.
- collect(attr: str, from_step: int | None = None) List | ndarray | Tensor[source]#
Collect
attrfrom all fitted estimators, pipelines or specific estimators within a pipeline.- Parameters:
- attrstr
The attribute to collect.
- from_stepOptional[int], default=None
If not
Noneand model is a pipeline, which estimator within that pipeline should the method be called from?
- Returns:
- outnp.ndarray | torch.Tensor
Stacked attributes of shape
(cv_n_[, ...]).
- decision_function(X: ndarray | Tensor) ndarray | Tensor[source]#
Call
decision_functionin all models.- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- Returns:
- dfnp.ndarray | torch.Tensor
The decision values of shape
(cv_n_[, ...]).
See also
mvpy.crossvalidation.Validator.callThe underlying call function.
- fit(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Validator[source]#
Fit and score the validator.
- Parameters:
- Xnp.ndarray | torch.Tensor
Input data of arbitrary shape.
- yOptional[np.ndarray | torch.Tensor], default=None
Output data of arbitrary shape.
- Returns:
- validatormvpy.crossvalidation.Validator
The fitted validator.
- predict(X: ndarray | Tensor) ndarray | Tensor[source]#
Call
predictin all models.- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- Returns:
- y_hnp.ndarray | torch.Tensor
The predicted output data of shape
(cv_n_[, ...]).
See also
mvpy.crossvalidation.Validator.callThe underlying call function.
- predict_proba(X: ndarray | Tensor) ndarray | Tensor[source]#
Call
predict_probain all models.- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- Returns:
- pnp.ndarray | torch.Tensor
The predicted probabilities of shape
(cv_n_[, ...]).
See also
mvpy.crossvalidation.Validator.callThe underlying call function.
- score(X: ndarray | Tensor, y: ndarray | Tensor) ndarray | Tensor | Dict[str, ndarray] | Dict[str, Tensor][source]#
Score new data in all models according to
metric.- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- ynp.ndarray | torch.Tensor
The output data of arbitrary shape.
- Returns:
- scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
The scores of all models in new data where individual entries are now of shape
(cv_n_[, ...]).
- transform(X: ndarray | Tensor) ndarray | Tensor[source]#
Call
transformin all models.- Parameters:
- Xnp.ndarray | torch.Tensor
The input data of arbitrary shape.
- Returns:
- Znp.ndarray | torch.Tensor
The transformed data of shape
(cv_n_[, ...]).
See also
mvpy.crossvalidation.Validator.callThe underlying call function.