Validator#

Implements automated cross-validation and scoring over estimators or pipelines.

This allows for easy cross-validated evaluation of models or pipelines, without having to explicitly write the code therefore–a common source of mistakes–and while still having access to the underlying fitted pipelines. This allows not only model evaluation but also evaluation on, for example, new unseen data or inspection of model parameters.

Parameters:

modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator: The model to fit and score. Can be either a pipeline or estimator object.
cvint | Any, default=5: The cross-validation procedure to follow. Either an object exposing a split() method, such as KFold, or an integer specifying the number of folds to use in KFold.
metricOptional[mvpy.metrics.Metric | Tuple[mvpy.metrics.Metric]], default=None: The metric to use for scoring. If None, this will default to the score() method exposed by model.
n_jobsOptional[int], default=None: How many jobs should be used to parallelise the cross-validation procedure?
verboseint | bool, default=False: Should progress be reported verbosely?

Attributes:

modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator: The model to fit and score. Can be either a pipeline or estimator object.
cvint | Any, default=5: The cross-validation procedure to follow. Either an object exposing a split() method, such as KFold, or an integer specifying the number of folds to use in KFold.
metricOptional[mvpy.metrics.Metric | Tuple[mvpy.metrics.Metric]], default=None: The metric to use for scoring. If None, this will default to the score() method exposed by model.
n_jobsOptional[int], default=None: How many jobs should be used to parallelise the cross-validation procedure?
verboseint | bool, default=False: Should progress be reported verbosely?
cv_Any: The instantiated cross-validation object exposing split().
cv_n_int: The number of cross-validation steps used by cv_.
model_List[sklearn.pipeline.Pipeline | sklearn.base.BaseEstimator]: The models fit during cross-validation.
score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]: The scores of all models on test data of an arbitrary output shape.
test_List[np.ndarray | torch.Tensor]: A list of test indices used for scoring.

See also

mvpy.crossvalidation.cross_val_score: A shorthand for fitting a Validator.

Notes

When trying to access individual functions or attributes within estimators of a pipeline, make sure to indicate the pipeline step in from_step when calling either call() or collect().

Warning

If multiple values are supplied for metric, this function will output a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Examples

In the simplest case where we have one estimator, we can do:

>>> import torch
>>> from mvpy.estimators import ReceptiveField
>>> from mvpy.crossvalidation import Validator
>>> ß = torch.tensor([1., 2., 3., 2., 1.])
>>> X = torch.normal(0, 1, (100, 1, 50))
>>> y = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y = y + torch.normal(0, 1, y.shape)
>>> trf = ReceptiveField(-2, 2, 1, alpha = 1e-5)
>>> validator = Validator(model = trf).fit(X, y)
>>> validator.score_.shape
torch.size([5, 1, 50])
>>> validator.collect('coef_').shape
torch.size([5, 1, 1, 5])
>>> X_new = torch.normal(0, 1, (100, 1, 50))
>>> y_new = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y_new = y_new + torch.normal(0, 1, y_new.shape)
>>> validator.score(X_new, y_new).shape_
torch.size([5, 1, 50])

If we have a pipeline, we may want to do:

>>> import torch
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import ReceptiveField
>>> from mvpy.crossvalidation import Validator
>>> from sklearn.pipeline import make_pipeline
>>> ß = torch.tensor([1., 2., 3., 2., 1.])
>>> X = torch.normal(0, 1, (100, 1, 50))
>>> y = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y = y + torch.normal(0, 1, y.shape)
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     ReceptiveField(-2, 2, 1, alpha = 1e-5)
>>> )
>>> validator = Validator(model = trf).fit(X, y)
>>> validator.score_.shape
torch.size([5, 1, 50])
>>> validator.collect('coef_', from_step = -1).shape
torch.size([5, 1, 1, 5])
>>> X_new = torch.normal(0, 1, (100, 1, 50))
>>> y_new = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y_new = y_new + torch.normal(0, 1, y_new.shape)
>>> validator.score(X_new, y_new).shape_
torch.size([5, 1, 50])

Whereas, in the more complicated case of also using Sliding, for example, we can do:

>>> import torch
>>> from mvpy.dataset import make_meeg_categorical
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import Sliding, RidgeClassifier
>>> from mvpy.crossvalidation import Validator
>>> from mvpy import metrics
>>> from sklearn.pipeline import make_pipeline
>>> y, X = make_meeg_categorical()
>>> clf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     Sliding(
>>>         RidgeClassifier(
>>>             torch.logspace(-5, 5, 10)
>>>         ),
>>>         dims = (-1,),
>>>         verbose = True,
>>>         n_jobs = None
>>>     )
>>> )
>>> validator = Validator(
>>>     model = clf, 
>>>     metric = (metrics.accuracy, metrics.roc_auc),
>>>     verbose = True
>>> ).fit(X, y)
>>> validator.score_['roc_auc'].shape
torch.Size([5, 1, 400])
>>> validator.call('collect', 'pattern_', from_step = -1).shape
torch.Size([5, 400, 64, 2])

call(method: str, *args: Any, from_step: int | None = None, **kwargs: Any) → ndarray | Tensor[source]#

Call method for all fitted estimators, pipelines or specific estimators within a pipeline.

Parameters:

methodstr: The method to call.
*argsany: Additional arguments to pass to the method.
from_stepOptional[int], default=None: If not None and model is a pipeline, which estimator within that pipeline should the method be called from?
**kwargsAny: Additional keyword arguments to pass to the method.

Returns:

outnp.ndarray | torch.Tensor: Stacked outputs from method call of shape (cv_n_[, ...]).

clone() → Validator[source]#

Obtain a clone of this validator.

Returns:

validatormvpy.crossvalidation.Validator: The cloned object.

collect(attr: str, from_step: int | None = None) → List | ndarray | Tensor[source]#

Collect attr from all fitted estimators, pipelines or specific estimators within a pipeline.

Parameters:

attrstr: The attribute to collect.
from_stepOptional[int], default=None: If not None and model is a pipeline, which estimator within that pipeline should the method be called from?

Returns:

outnp.ndarray | torch.Tensor: Stacked attributes of shape (cv_n_[, ...]).

decision_function(X: ndarray | Tensor) → ndarray | Tensor[source]#

Call decision_function in all models.

Parameters:

Xnp.ndarray | torch.Tensor: The input data of arbitrary shape.

Returns:

dfnp.ndarray | torch.Tensor: The decision values of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call: The underlying call function.

fit(X: ndarray | Tensor, y: ndarray | Tensor | None = None) → Validator[source]#

Fit and score the validator.

Parameters:

Xnp.ndarray | torch.Tensor: Input data of arbitrary shape.
yOptional[np.ndarray | torch.Tensor], default=None: Output data of arbitrary shape.

Returns:

validatormvpy.crossvalidation.Validator: The fitted validator.

predict(X: ndarray | Tensor) → ndarray | Tensor[source]#

Call predict in all models.

Parameters:

Xnp.ndarray | torch.Tensor: The input data of arbitrary shape.

Returns:

y_hnp.ndarray | torch.Tensor: The predicted output data of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call: The underlying call function.

predict_proba(X: ndarray | Tensor) → ndarray | Tensor[source]#

Call predict_proba in all models.

Parameters:

Xnp.ndarray | torch.Tensor: The input data of arbitrary shape.

Returns:

pnp.ndarray | torch.Tensor: The predicted probabilities of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call: The underlying call function.

Score new data in all models according to metric.

Parameters:

Xnp.ndarray | torch.Tensor: The input data of arbitrary shape.
ynp.ndarray | torch.Tensor: The output data of arbitrary shape.

Returns:

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]: The scores of all models in new data where individual entries are now of shape (cv_n_[, ...]).

transform(X: ndarray | Tensor) → ndarray | Tensor[source]#

Call transform in all models.

Parameters:

Xnp.ndarray | torch.Tensor: The input data of arbitrary shape.

Returns:

Znp.ndarray | torch.Tensor: The transformed data of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call: The underlying call function.

Validator#

This Page