RepeatedStratifiedKFold#
- class mvpy.crossvalidation.RepeatedStratifiedKFold(n_splits: int = 5, n_repeats: int = 1, random_state: int | Generator | Generator | None = None)[source]#
Implements a repeated stratified k-folds cross-validator.
- Parameters:
- n_splitsint, default=5
Number of splits to use.
- n_repeatsint, default=1
Number of repeats to use.
- random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None
Random state to use for shuffling (either integer seed or numpy/torch generator), if any.
- Attributes:
- n_splitsint, default=5
Number of splits to use.
- n_repeatsint, default=1
Number of repeats to use.
- random_stateOptional[Union[int, np.random._generator.Generator, torch._C.Generator]], default=None
Random state to use for shuffling (either integer seed or numpy/torch generator), if any.
Notes
For reproducability when using shuffling, you can set the random_state to an integer.
Note also that, when using shuffling, please make sure to instantiate and transform immediately to the backend you would like. Otherwise, each call to split will instantiate a new object with the same random seed.
Examples
>>> import torch >>> from mvpy.crossvalidation import RepeatedStratifiedKFold >>> X = torch.randn(75, 5) >>> y = torch.tensor([0] * 40 + [1] * 25 + [2] * 10) >>> kf = RepeatedStratifiedKFold(n_splits = 2, n_repeats = 2).to_torch() >>> for f_i, (train, test) in enumerate(kf.split(X, y)): >>> train_idx, train_cnt = torch.unique(y[train], return_counts = True) >>> _, test_cnt = torch.unique(y[test], return_counts = True) >>> print(f'Fold {f_i}: classes={train_idx} N(train)={train_cnt} N(test)={test_cnt}') Fold 0: classes=tensor([0, 1, 2]) N(train)=tensor([20, 12, 5]) N(test)=tensor([20, 13, 5]) Fold 1: classes=tensor([0, 1, 2]) N(train)=tensor([20, 13, 5]) N(test)=tensor([20, 12, 5]) Fold 2: classes=tensor([0, 1, 2]) N(train)=tensor([20, 12, 5]) N(test)=tensor([20, 13, 5]) Fold 3: classes=tensor([0, 1, 2]) N(train)=tensor([20, 13, 5]) N(test)=tensor([20, 12, 5])
- split(X: ndarray | Tensor, y: ndarray | Tensor | None = None) Generator[tuple[ndarray, ndarray], None, None] | Generator[tuple[Tensor, Tensor], None, None][source]#
Repeatedly split the dataset into stratified iterable (train, test).
- Parameters:
- XUnion[np.ndarray, torch.Tensor]
Input data of shape (n_samples, …)
- yOptional[Union[np.ndarray, torch.Tensor]], default=None
Target data of shape (n_samples, …). Unused, but parameter available for consistency.
- Returns:
- kfUnion[collections.abc.Generator[tuple[np.ndarray, np.ndarray], None, None], collections.abc.Generator[tuple[torch.Tensor, torch.Tensor], None, None]]
Iterable generator of (train, test) pairs.