StratifiedKFold#

class mvpy.crossvalidation.StratifiedKFold(n_splits: int = 5, shuffle: bool = False, random_state: int | Generator | Generator | None = None)[source]#

Implements a stratified k-folds cross-validator.

Unlike sklearn, this will also stratify across features of (n_samples[, …], n_features[, n_timepoints]).

Parameters:
n_splitsint, default=5

Number of splits to use.

shufflebool, default=False

Should we shuffle indices before splitting?

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

Attributes:
n_splitsint, default=5

Number of splits to use.

shufflebool, default=False

Should we shuffle indices before splitting?

random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None

Random state to use for shuffling (either integer seed or numpy/torch generator), if any.

rng_Union[np.random._generator.Generator, torch._C.Generator]

Random generator derived from random_state.

Notes

For reproducability when using shuffling, you can set the random_state to an integer.

Note also that, when using shuffling, please make sure to instantiate and transform immediately to the backend you would like. Otherwise, each call to split will instantiate a new object with the same random seed. See examples for a demonstration.

Examples

First, let’s assume we have just one feature:

>>> import torch
>>> from mvpy.crossvalidation import StratifiedKFold
>>> X = torch.randn(75, 5)
>>> y = torch.tensor([0] * 40 + [1] * 25 + [2] * 10)
>>> kf = StratifiedKFold()
>>> for f_i, (train, test) in enumerate(kf.split(X, y)):
>>>     train_idx, train_cnt = torch.unique(y[train], return_counts = True)
>>>     _, test_cnt = torch.unique(y[test], return_counts = True)
>>>     print(f'Fold {f_i}: classes={train_idx}     N(train)={train_cnt}    N(test)={test_cnt}')
Fold 0: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 1: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 2: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 3: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])
Fold 4: classes=tensor([0, 1, 2])   N(train)=tensor([32, 20,  8])   N(test)=tensor([8, 5, 2])

Second, let’s assume we have multiple features and we want to shuffle indices. Note that this will also work if features have overlapping class names, but for clarity here we use different offsets:

>>> import torch
>>> from mvpy.crossvalidation import StratifiedKFold
>>> X = torch.randn(75, 5)
>>> y0 = torch.tensor([0] * 40 + [1] * 25 + [2] * 10)[:,None]
>>> y1 = torch.tensor([3] * 15 + [4] * 45 + [5] * 15)[:,None]
>>> y = torch.stack((y0, y1), dim = 1)
>>> kf = StratifiedKFold(shuffle = True).to_torch()
>>> for f_i, (train, test) in enumerate(kf.split(X, y)):
>>>     train_idx, train_cnt = torch.unique(y[train], return_counts = True)
>>>     _, test_cnt = torch.unique(y[test], return_counts = True)
>>>     print(f'Fold {f_i}: classes={train_idx}     N(train)={train_cnt}    N(test)={test_cnt}')
Fold 0: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 1: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 2: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 3: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
Fold 4: classes=tensor([0, 1, 2, 3, 4, 5])  N(train)=tensor([32, 20,  8, 12, 36, 12])       N(test)=tensor([8, 5, 2, 3, 9, 3])
split(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Generator[tuple[ndarray, ndarray], None, None] | Generator[tuple[Tensor, Tensor], None, None][source]#

Split the dataset into stratified iterable (train, test).

Parameters:
XUnion[np.ndarray, torch.Tensor]

Input data of shape (n_samples, …)

yOptional[Union[np.ndarray, torch.Tensor]], default=None

Target data of shape (n_samples, …). Unused, but parameter available for consistency.

Returns:
kfUnion[collections.abc.Generator[tuple[np.ndarray, np.ndarray], None, None], collections.abc.Generator[tuple[torch.Tensor, torch.Tensor], None, None]]

Iterable generator of (train, test) pairs.

to_numpy() _StratifiedKFold_numpy[source]#

Convert class to torch backend.

Returns:
kf_StratifiedKFold_numpy

The k-fold cross-validator in numpy.

to_torch() _StratifiedKFold_torch[source]#

Convert class to torch backend.

Returns:
kf_StratifiedKFold_torch

The k-fold cross-validator in torch.