mvpy.crossvalidation package#

Submodules#

mvpy.crossvalidation.cross_val_score module#

A collection of classes and functions to automatically perform cross-validation.

mvpy.crossvalidation.cross_val_score.cross_val_score(model: Pipeline | BaseEstimator, X: ndarray | Tensor, y: ndarray | Tensor | None = None, cv: int | Any | None = 5, metric: Metric | Tuple[Metric] | None = None, return_validator: bool = True, n_jobs: int | None = None, verbose: int | bool = False) ndarray | Tensor | Dict | Tuple[Validator, ndarray | Tensor | Dict][source]#

Implements a shorthand for automated cross-validation scoring over estimators or pipelines.

This function acts as a shorthand for Validator where it will automatically create and fit the validator, returning either only its output scores or, if return_validator is True, both the fitted validator object and the scores in a tuple.

For more information, please see Validator.

Parameters:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit and score. Can be either a pipeline or estimator object.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

yOptional[np.ndarray | torch.Tensor], default=None

The outcome data of arbitrary shape.

cvOptional[int | Any], default=5

The cross-validation procedure to follow. Either an object exposing a split() method, such as KFold or an integer specifying the number of folds to use in KFold.

metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

return_validatorbool, default=True

Should the underlying validator object be returned?

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the cross-validation procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

Returns:
validatorOptional[mvpy.crossvalidation.Validator]

If return_validator is True, the underlying validator object.

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The scores from cross-validation of arbitrary output shape.

Warning

If multiple values are supplied for metric, this function will output a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

See also

mvpy.crossvalidation.Validator

The underlying Validator class.

Examples

>>> import torch
>>> from mvpy.estimators import ReceptiveField
>>> from mvpy.crossvalidation import cross_val_score
>>> ß = torch.tensor([1., 2., 3., 2., 1.])
>>> X = torch.normal(0, 1, (100, 1, 50))
>>> y = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y = y + torch.normal(0, 1, y.shape)
>>> trf = ReceptiveField(-2, 2, 1, alpha = 1e-5)
>>> validator, scores = cross_val_score(trf, X, y)
>>> scores.mean()
tensor(0.9432)

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

mvpy.crossvalidation.kfold module#

A collection of classes for k-fold cross-validation.

class mvpy.crossvalidation.kfold.KFold(n_splits: int = 5, shuffle: bool = False, random_state: int | Generator | Generator | None = None)[source]#

Bases: object

Implements a k-folds cross-validator.

In principle, this class is redundant with sklearn.model_selection.KFold. However, for the torch backend, this class is useful because it automatically creates indices on the desired device.

Parameters:
n_splitsint, default=5

Number of splits to use.

shufflebool, default=False

Should we shuffle indices before splitting?

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

Attributes:
n_splitsint, default=5

Number of splits to use.

shufflebool, default=False

Should we shuffle indices before splitting?

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

rng_Union[np.random._generator.Generator, torch._C.Generator]

Random generator derived from random_state.

Notes

For reproducability when using shuffling, you can set the random_state to an integer.

Note also that, when using shuffling, please make sure to instantiate and transform immediately to the backend you would like. Otherwise, each call to split will instantiate a new object with the same random seed. See examples for a demonstration.

Examples

If we are not using shuffling, we can simply do:

>>> import torch
>>> from mvpy.crossvalidation import KFold
>>> X = torch.arange(10)
>>> kf = KFold()
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
Fold0: train=tensor([2, 3, 4, 5, 6, 7, 8, 9])       test=tensor([0, 1])
Fold1: train=tensor([0, 1, 4, 5, 6, 7, 8, 9])       test=tensor([2, 3])
Fold2: train=tensor([0, 1, 2, 3, 6, 7, 8, 9])       test=tensor([4, 5])
Fold3: train=tensor([0, 1, 2, 3, 4, 5, 8, 9])       test=tensor([6, 7])
Fold4: train=tensor([0, 1, 2, 3, 4, 5, 6, 7])       test=tensor([8, 9])

However, let’s assume we want to use shuffling. We might be inclined to do:

>>> import torch
>>> from mvpy.crossvalidation import KFold
>>> X = torch.arange(6)
>>> kf = KFold(n_splits = 2, shuffle = True, random_state = 42)
>>> print(f'Run 1:')
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
>>> print(f'Run 2:')
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
Run 1:
Fold0: train=tensor([4, 1, 5])      test=tensor([0, 3, 2])
Fold1: train=tensor([0, 3, 2])      test=tensor([4, 1, 5])
Run 2:
Fold0: train=tensor([4, 1, 5])      test=tensor([0, 3, 2])
Fold1: train=tensor([0, 3, 2])      test=tensor([4, 1, 5])

Note that here we pass random_state to make this reproducible on your end. As you can see, the randomisation is now static across runs. This occurs because, up until the call to split the data, MVPy cannot consistently infer the desired data type. Therefore, the backend class is instantiated only upon calling split where types become explicit. However, this means that each call to split will re-instantiate the class. We can easily work around this in two ways:

>>> import torch
>>> from mvpy.crossvalidation import KFold
>>> X = torch.arange(6)
>>> kf = KFold(n_splits = 2, shuffle = True, random_state = 42).to_torch()
>>> print(f'Run 1:')
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
>>> print(f'Run 2:')
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
Run 1:
Fold0: train=tensor([4, 1, 5])      test=tensor([0, 3, 2])
Fold1: train=tensor([0, 3, 2])      test=tensor([4, 1, 5])
Run 2:
Fold0: train=tensor([4, 0, 3])      test=tensor([5, 1, 2])
Fold1: train=tensor([5, 1, 2])      test=tensor([4, 0, 3])

Here, we explicitly instantiate a torch operator that is not reinstantiated across runs, which works perfectly. We could, however, also use an external generator to achieve the same result:

>>> import torch
>>> from mvpy.crossvalidation import KFold
>>> X = torch.arange(6)
>>> rng = torch.Generator()
>>> rng.manual_seed(42)
>>> kf = KFold(n_splits = 2, shuffle = True, random_state = rng)
>>> print('Run 1:')
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
>>> print('Run 2:')
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
Run 1:
Fold0: train=tensor([4, 1, 5])      test=tensor([0, 3, 2])
Fold1: train=tensor([0, 3, 2])      test=tensor([4, 1, 5])
Run 2:
Fold0: train=tensor([4, 0, 3])      test=tensor([5, 1, 2])
Fold1: train=tensor([5, 1, 2])      test=tensor([4, 0, 3])
split(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Generator[tuple[ndarray, ndarray], None, None] | Generator[tuple[Tensor, Tensor], None, None][source]#

Split the dataset into iterable (train, test).

Parameters:
XUnion[np.ndarray, torch.Tensor]

Input data of shape (n_samples, …)

yOptional[Union[np.ndarray, torch.Tensor]], default=None

Target data of shape (n_samples, …). Unused, but parameter available for consistency.

Returns:
kfUnion[collections.abc.Generator[tuple[np.ndarray, np.ndarray], None, None], collections.abc.Generator[tuple[torch.Tensor, torch.Tensor], None, None]]

Iterable generator of (train, test) pairs.

to_numpy() _KFold_numpy[source]#

Convert class to torch backend.

Returns:
kf_KFold_numpy

The k-fold cross-validator in numpy.

to_torch() _KFold_torch[source]#

Convert class to torch backend.

Returns:
kf_KFold_torch

The k-fold cross-validator in torch.

The torch package contains data structures for multi-dimensional

mvpy.crossvalidation.repeatedkfold module#

A collection of classes for repeated k-fold cross-validation.

class mvpy.crossvalidation.repeatedkfold.RepeatedKFold(n_splits: int = 5, n_repeats: int = 1, random_state: int | Generator | Generator | None = None)[source]#

Bases: object

Implements a repeated k-folds cross-validator.

In principle, this class is redundant with sklearn.model_selection.RepeatedKFold. However, for the torch backend, this class is useful because it automatically creates indices on the desired device.

Parameters:
n_splitsint, default=5

Number of splits to use.

n_repeatsint, default=1

Number of repeats to use.

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

Attributes:
n_splitsint, default=5

Number of splits to use.

n_repeatsint, default=1

Number of repeats to use.

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

Notes

For reproducability when using shuffling, you can set the random_state to an integer.

Note also that, when using shuffling, please make sure to instantiate and transform immediately to the backend you would like. Otherwise, each call to split will instantiate a new object with the same random seed.

Examples

>>> import torch
>>> from mvpy.crossvalidation import RepeatedKFold
>>> X = torch.arange(6)
>>> kf = RepeatedKFold(n_splits = 2, n_repeats = 2).to_torch()
>>> for f_i, (train, test) in enumerate(kf.split(X)):
>>>     print(f'Fold{f_i}: train={train}    test={test}')
Fold0: train=tensor([4, 1, 2])      test=tensor([0, 5, 3])
Fold1: train=tensor([0, 5, 3])      test=tensor([4, 1, 2])
Fold2: train=tensor([2, 1, 3])      test=tensor([5, 0, 4])
Fold3: train=tensor([5, 0, 4])      test=tensor([2, 1, 3])
split(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Generator[tuple[ndarray, ndarray], None, None] | Generator[tuple[Tensor, Tensor], None, None][source]#

Repeatedly split the dataset into iterable (train, test).

Parameters:
XUnion[np.ndarray, torch.Tensor]

Input data of shape (n_samples, …)

yOptional[Union[np.ndarray, torch.Tensor]], default=None

Target data of shape (n_samples, …). Unused, but parameter available for consistency.

Returns:
kfUnion[collections.abc.Generator[tuple[np.ndarray, np.ndarray], None, None], collections.abc.Generator[tuple[torch.Tensor, torch.Tensor], None, None]]

Iterable generator of (train, test) pairs.

to_numpy() _RepeatedKFold_numpy[source]#

Convert class to torch backend.

Returns:
kf_RepeatedKFold_numpy

The repeated k-fold cross-validator in numpy.

to_torch() _RepeatedKFold_torch[source]#

Convert class to torch backend.

Returns:
kf_RepeatedKFold_torch

The repeated k-fold cross-validator in torch.

The torch package contains data structures for multi-dimensional

mvpy.crossvalidation.repeatedstratifiedkfold module#

A collection of classes for repeated stratified k-fold cross-validation.

class mvpy.crossvalidation.repeatedstratifiedkfold.RepeatedStratifiedKFold(n_splits: int = 5, n_repeats: int = 1, random_state: int | Generator | Generator | None = None)[source]#

Bases: object

Implements a repeated stratified k-folds cross-validator.

Parameters:
n_splitsint, default=5

Number of splits to use.

n_repeatsint, default=1

Number of repeats to use.

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

Attributes:
n_splitsint, default=5

Number of splits to use.

n_repeatsint, default=1

Number of repeats to use.

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

Notes

For reproducability when using shuffling, you can set the random_state to an integer.

Note also that, when using shuffling, please make sure to instantiate and transform immediately to the backend you would like. Otherwise, each call to split will instantiate a new object with the same random seed.

Examples

>>> import torch
>>> from mvpy.crossvalidation import RepeatedStratifiedKFold
>>> X = torch.randn(75, 5)
>>> y = torch.tensor([0] * 40 + [1] * 25 + [2] * 10)
>>> kf = RepeatedStratifiedKFold(n_splits = 2, n_repeats = 2).to_torch()
>>> for f_i, (train, test) in enumerate(kf.split(X, y)):
>>>     train_idx, train_cnt = torch.unique(y[train], return_counts = True)
>>>     _, test_cnt = torch.unique(y[test], return_counts = True)
>>>     print(f'Fold {f_i}: classes={train_idx}     N(train)={train_cnt}    N(test)={test_cnt}')
Fold 0: classes=tensor([0, 1, 2])   N(train)=tensor([20, 12,  5])   N(test)=tensor([20, 13,  5])
Fold 1: classes=tensor([0, 1, 2])   N(train)=tensor([20, 13,  5])   N(test)=tensor([20, 12,  5])
Fold 2: classes=tensor([0, 1, 2])   N(train)=tensor([20, 12,  5])   N(test)=tensor([20, 13,  5])
Fold 3: classes=tensor([0, 1, 2])   N(train)=tensor([20, 13,  5])   N(test)=tensor([20, 12,  5])
split(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Generator[tuple[ndarray, ndarray], None, None] | Generator[tuple[Tensor, Tensor], None, None][source]#

Repeatedly split the dataset into stratified iterable (train, test).

Parameters:
XUnion[np.ndarray, torch.Tensor]

Input data of shape (n_samples, …)

yOptional[Union[np.ndarray, torch.Tensor]], default=None

Target data of shape (n_samples, …). Unused, but parameter available for consistency.

Returns:
kfUnion[collections.abc.Generator[tuple[np.ndarray, np.ndarray], None, None], collections.abc.Generator[tuple[torch.Tensor, torch.Tensor], None, None]]

Iterable generator of (train, test) pairs.

to_numpy() _RepeatedStratifiedKFold_numpy[source]#

Convert class to torch backend.

Returns:
kf_RepeatedKFold_numpy

The repeated k-fold cross-validator in numpy.

to_torch() _RepeatedStratifiedKFold_torch[source]#

Convert class to torch backend.

Returns:
kf_RepeatedKFold_torch

The repeated k-fold cross-validator in torch.

The torch package contains data structures for multi-dimensional

mvpy.crossvalidation.stratifiedkfold module#

A collection of classes for stratified k-fold cross-validation.

class mvpy.crossvalidation.stratifiedkfold.StratifiedKFold(n_splits: int = 5, shuffle: bool = False, random_state: int | Generator | Generator | None = None)[source]#

Bases: object

Implements a stratified k-folds cross-validator.

Unlike sklearn, this will also stratify across features of (n_samples[, …], n_features[, n_timepoints]).

Parameters:
n_splitsint, default=5

Number of splits to use.

shufflebool, default=False

Should we shuffle indices before splitting?

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

Attributes:
n_splitsint, default=5

Number of splits to use.

shufflebool, default=False

Should we shuffle indices before splitting?

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

rng_Union[np.random._generator.Generator, torch._C.Generator]

Random generator derived from random_state.

Notes

For reproducability when using shuffling, you can set the random_state to an integer.

Note also that, when using shuffling, please make sure to instantiate and transform immediately to the backend you would like. Otherwise, each call to split will instantiate a new object with the same random seed. See examples for a demonstration.

Examples

First, let’s assume we have just one feature:

>>> import torch
>>> from mvpy.crossvalidation import StratifiedKFold
>>> X = torch.randn(75, 5)
>>> y = torch.tensor([0] * 40 + [1] * 25 + [2] * 10)
>>> kf = StratifiedKFold()
>>> for f_i, (train, test) in enumerate(kf.split(X, y)):
>>>     train_idx, train_cnt = torch.unique(y[train], return_counts = True)
>>>     _, test_cnt = torch.unique(y[test], return_counts = True)
>>>     print(f'Fold {f_i}: classes={train_idx}     N(train)={train_cnt}    N(test)={test_cnt}')
Fold 0: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 1: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 2: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 3: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 4: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])

Second, let’s assume we have multiple features and we want to shuffle indices. Note that this will also work if features have overlapping class names, but for clarity here we use different offsets:

>>> import torch
>>> from mvpy.crossvalidation import StratifiedKFold
>>> X = torch.randn(75, 5)
>>> y0 = torch.tensor([0] * 40 + [1] * 25 + [2] * 10)[:,None]
>>> y1 = torch.tensor([3] * 15 + [4] * 45 + [5] * 15)[:,None]
>>> y = torch.stack((y0, y1), dim = 1)
>>> kf = StratifiedKFold(shuffle = True).to_torch()
>>> for f_i, (train, test) in enumerate(kf.split(X, y)):
>>>     train_idx, train_cnt = torch.unique(y[train], return_counts = True)
>>>     _, test_cnt = torch.unique(y[test], return_counts = True)
>>>     print(f'Fold {f_i}: classes={train_idx}     N(train)={train_cnt}    N(test)={test_cnt}')
Fold 0: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 1: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 2: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 3: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 4: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
split(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Generator[tuple[ndarray, ndarray], None, None] | Generator[tuple[Tensor, Tensor], None, None][source]#

Split the dataset into stratified iterable (train, test).

Parameters:
XUnion[np.ndarray, torch.Tensor]

Input data of shape (n_samples, …)

yOptional[Union[np.ndarray, torch.Tensor]], default=None

Target data of shape (n_samples, …). Unused, but parameter available for consistency.

Returns:
kfUnion[collections.abc.Generator[tuple[np.ndarray, np.ndarray], None, None], collections.abc.Generator[tuple[torch.Tensor, torch.Tensor], None, None]]

Iterable generator of (train, test) pairs.

to_numpy() _StratifiedKFold_numpy[source]#

Convert class to torch backend.

Returns:
kf_StratifiedKFold_numpy

The k-fold cross-validator in numpy.

to_torch() _StratifiedKFold_torch[source]#

Convert class to torch backend.

Returns:
kf_StratifiedKFold_torch

The k-fold cross-validator in torch.

The torch package contains data structures for multi-dimensional

mvpy.crossvalidation.validator module#

A collection of classes and functions to automatically perform cross-validation.

class mvpy.crossvalidation.validator.Validator(model: BaseEstimator | Pipeline, cv: int | Any = 5, metric: Metric | Tuple[Metric] | None = None, n_jobs: int | None = None, verbose: int | bool = False)[source]#

Bases: BaseEstimator

Implements automated cross-validation and scoring over estimators or pipelines.

This allows for easy cross-validated evaluation of models or pipelines, without having to explicitly write the code therefore–a common source of mistakes–and while still having access to the underlying fitted pipelines. This allows not only model evaluation but also evaluation on, for example, new unseen data or inspection of model parameters.

Parameters:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit and score. Can be either a pipeline or estimator object.

cvint | Any, default=5

The cross-validation procedure to follow. Either an object exposing a split() method, such as KFold, or an integer specifying the number of folds to use in KFold.

metricOptional[mvpy.metrics.Metric | Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the cross-validation procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

Attributes:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit and score. Can be either a pipeline or estimator object.

cvint | Any, default=5

The cross-validation procedure to follow. Either an object exposing a split() method, such as KFold, or an integer specifying the number of folds to use in KFold.

metricOptional[mvpy.metrics.Metric | Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

n_jobsOptional[int], default=None

How many jobs should be used to parallelise the cross-validation procedure?

verboseint | bool, default=False

Should progress be reported verbosely?

cv_Any

The instantiated cross-validation object exposing split().

cv_n_int

The number of cross-validation steps used by cv_.

model_List[sklearn.pipeline.Pipeline | sklearn.base.BaseEstimator]

The models fit during cross-validation.

score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The scores of all models on test data of an arbitrary output shape.

test_List[np.ndarray | torch.Tensor]

A list of test indices used for scoring.

See also

mvpy.crossvalidation.cross_val_score

A shorthand for fitting a Validator.

Notes

When trying to access individual functions or attributes within estimators of a pipeline, make sure to indicate the pipeline step in from_step when calling either call() or collect().

Warning

If multiple values are supplied for metric, this function will output a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

Warning

When specifying n_jobs here, be careful not to specify any number of jobs in the model. Otherwise, this will lead to a situation where individual jobs each try to initialise more low-level jobs, severely hurting performance.

Examples

In the simplest case where we have one estimator, we can do:

>>> import torch
>>> from mvpy.estimators import ReceptiveField
>>> from mvpy.crossvalidation import Validator
>>> ß = torch.tensor([1., 2., 3., 2., 1.])
>>> X = torch.normal(0, 1, (100, 1, 50))
>>> y = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y = y + torch.normal(0, 1, y.shape)
>>> trf = ReceptiveField(-2, 2, 1, alpha = 1e-5)
>>> validator = Validator(model = trf).fit(X, y)
>>> validator.score_.shape
torch.size([5, 1, 50])
>>> validator.collect('coef_').shape
torch.size([5, 1, 1, 5])
>>> X_new = torch.normal(0, 1, (100, 1, 50))
>>> y_new = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y_new = y_new + torch.normal(0, 1, y_new.shape)
>>> validator.score(X_new, y_new).shape_
torch.size([5, 1, 50])

If we have a pipeline, we may want to do:

>>> import torch
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import ReceptiveField
>>> from mvpy.crossvalidation import Validator
>>> from sklearn.pipeline import make_pipeline
>>> ß = torch.tensor([1., 2., 3., 2., 1.])
>>> X = torch.normal(0, 1, (100, 1, 50))
>>> y = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y = y + torch.normal(0, 1, y.shape)
>>> trf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     ReceptiveField(-2, 2, 1, alpha = 1e-5)
>>> )
>>> validator = Validator(model = trf).fit(X, y)
>>> validator.score_.shape
torch.size([5, 1, 50])
>>> validator.collect('coef_', from_step = -1).shape
torch.size([5, 1, 1, 5])
>>> X_new = torch.normal(0, 1, (100, 1, 50))
>>> y_new = torch.nn.functional.conv1d(X, ß[None,None,:], padding = 'same')
>>> y_new = y_new + torch.normal(0, 1, y_new.shape)
>>> validator.score(X_new, y_new).shape_
torch.size([5, 1, 50])

Whereas, in the more complicated case of also using Sliding, for example, we can do:

>>> import torch
>>> from mvpy.dataset import make_meeg_categorical
>>> from mvpy.preprocessing import Scaler
>>> from mvpy.estimators import Sliding, RidgeClassifier
>>> from mvpy.crossvalidation import Validator
>>> from mvpy import metrics
>>> from sklearn.pipeline import make_pipeline
>>> y, X = make_meeg_categorical()
>>> clf = make_pipeline(
>>>     Scaler().to_torch(),
>>>     Sliding(
>>>         RidgeClassifier(
>>>             torch.logspace(-5, 5, 10)
>>>         ),
>>>         dims = (-1,),
>>>         verbose = True,
>>>         n_jobs = None
>>>     )
>>> )
>>> validator = Validator(
>>>     model = clf, 
>>>     metric = (metrics.accuracy, metrics.roc_auc),
>>>     verbose = True
>>> ).fit(X, y)
>>> validator.score_['roc_auc'].shape
torch.Size([5, 1, 400])
>>> validator.call('collect', 'pattern_', from_step = -1).shape
torch.Size([5, 400, 64, 2])
call(method: str, *args: Any, from_step: int | None = None, **kwargs: Any) ndarray | Tensor[source]#

Call method for all fitted estimators, pipelines or specific estimators within a pipeline.

Parameters:
methodstr

The method to call.

*argsany

Additional arguments to pass to the method.

from_stepOptional[int], default=None

If not None and model is a pipeline, which estimator within that pipeline should the method be called from?

**kwargsAny

Additional keyword arguments to pass to the method.

Returns:
outnp.ndarray | torch.Tensor

Stacked outputs from method call of shape (cv_n_[, ...]).

clone() Validator[source]#

Obtain a clone of this validator.

Returns:
validatormvpy.crossvalidation.Validator

The cloned object.

collect(attr: str, from_step: int | None = None) List | ndarray | Tensor[source]#

Collect attr from all fitted estimators, pipelines or specific estimators within a pipeline.

Parameters:
attrstr

The attribute to collect.

from_stepOptional[int], default=None

If not None and model is a pipeline, which estimator within that pipeline should the method be called from?

Returns:
outnp.ndarray | torch.Tensor

Stacked attributes of shape (cv_n_[, ...]).

decision_function(X: ndarray | Tensor) ndarray | Tensor[source]#

Call decision_function in all models.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

Returns:
dfnp.ndarray | torch.Tensor

The decision values of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call

The underlying call function.

fit(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Validator[source]#

Fit and score the validator.

Parameters:
Xnp.ndarray | torch.Tensor

Input data of arbitrary shape.

yOptional[np.ndarray | torch.Tensor], default=None

Output data of arbitrary shape.

Returns:
validatormvpy.crossvalidation.Validator

The fitted validator.

predict(X: ndarray | Tensor) ndarray | Tensor[source]#

Call predict in all models.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

Returns:
y_hnp.ndarray | torch.Tensor

The predicted output data of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call

The underlying call function.

predict_proba(X: ndarray | Tensor) ndarray | Tensor[source]#

Call predict_proba in all models.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

Returns:
pnp.ndarray | torch.Tensor

The predicted probabilities of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call

The underlying call function.

score(X: ndarray | Tensor, y: ndarray | Tensor) ndarray | Tensor | Dict[str, ndarray] | Dict[str, Tensor][source]#

Score new data in all models according to metric.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

ynp.ndarray | torch.Tensor

The output data of arbitrary shape.

Returns:
scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The scores of all models in new data where individual entries are now of shape (cv_n_[, ...]).

transform(X: ndarray | Tensor) ndarray | Tensor[source]#

Call transform in all models.

Parameters:
Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

Returns:
Znp.ndarray | torch.Tensor

The transformed data of shape (cv_n_[, ...]).

See also

mvpy.crossvalidation.Validator.call

The underlying call function.

mvpy.crossvalidation.validator.fit_model_(model: BaseEstimator | Pipeline, train: ndarray | Tensor, test: ndarray | Tensor, X: ndarray | Tensor, y: ndarray | Tensor | None = None, metric: Metric | Tuple[Metric] | None = None) Dict[source]#

Implements a single model fitting and scoring procedure.

Parameters:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The model to fit. Can be either pipeline or estimator object.

trainnp.ndarray | torch.Tensor

The training data indicies.

testnp.ndarray | torch.Tensor

The testing data indices.

Xnp.ndarray | torch.Tensor

The input data of arbitrary shape.

yOptional[np.ndarray | torch.Tensor], default=None

The outcome data of arbitrary shape.

metricOptional[mvpy.metrics.Metric, Tuple[mvpy.metrics.Metric]], default=None

The metric to use for scoring. If None, this will default to the score() method exposed by model.

Returns:
outDict
Output dictionary containing:
modelsklearn.pipeline.Pipeline | sklearn.base.BaseEstimator

The fitted model or pipeline.

score_np.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]

The scores from cross-validation.

testnp.ndarray | torch.Tensor

The test indices.

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

Module contents#

A collection of classes for k-fold cross-validation.

A collection of classes for repeated k-fold cross-validation.

A collection of classes for repeated stratified k-fold cross-validation.

A collection of classes for stratified k-fold cross-validation.

A collection of classes and functions to automatically perform cross-validation.