mvpy.preprocessing package#

Submodules#

mvpy.preprocessing.clamp module#

A collection of estimators for clamping data.

Bases: BaseEstimator

Implements a clamp to handle extreme values.

Generally, this will clamp data \(X\) to lower and upper bounds defined by lower and upper whenever they are exceeded.

This can be useful for dealing with outliers: For example, in M-/EGG data that was minimally preprocessed, this may be used to curb EOG artifacts easily without removing time points or trials.

By default, both lower and upper will be None. This constitutes a special case where the bounds will then be fit directly to the data. There are three different ways of fitting bounds, controlled by method:

iqr: This will compute the inter-quartile range \([0.25, 0.75]\) and clamp data where \(X\notin [\textrm{median}(X) - k L, \textrm{median}(X) + k U]\).
quantile: This will clamp data outside of the quantiles given by \([k, 1 - k]\).
mad: This will clamp data at \(\textrm{median}(X)\pm k \textrm{MAD}\) where MAD are median absolute deviations.

If only one of the two bounds is None instead, the unspecified bound will be interpreted as meaning no clamping in this direction to be desired.

Parameters:

lowerOptional[float], default=None: Lower bound for clamping. If None, no lower bound is applied.
upperOptional[float], default=None: Upper bound for clamping, If None, no upper bound is applied.
method{‘iqr’, ‘quantile’, ‘mad’}, default=’iqr’: If both lower and upper are None, what method to use for fitting bounds?
kOptional[float], default=None: For method iqr, scale the \([0.25, 0.75]\) quantiles by \(k\) (with default=1.5). For method quantile, clamp tails outside \([k, 1 - k]\) (with default = 0.05). For method mad, scale the median absolute deviation by \(k\) (with default=3.0). Otherwise unused.
epsfloat, default=1e-9: When checking span correctness, epsilon to apply as jitter.
dimsint, list or tuple of ints, default=None: The dimensions over which to scale (None for first dimension).

Attributes:

lowerOptional[float], default=None: Lower bound for clamping. If None, no lower bound is applied.
upperOptional[float], default=None: Upper bound for clamping, If None, no upper bound is applied.
method{‘iqr’, ‘quantile’, ‘mad’}, default=’iqr’: If both lower and upper are None, what method to use for fitting bounds?
kOptional[float], default=None: For method iqr, scale the \([0.25, 0.75]\) quantiles by \(k\) (with default=1.5). For method quantile, clamp tails outside \([k, 1 - k]\) (with default = 0.05). For method mad, scale the median absolute deviation by \(k\) (with default=3.0). Otherwise unused.
epsfloat, default=1e-9: When checking span correctness, epsilon to apply as jitter.
dimsint, list or tuple of ints, default=None: The dimensions over which to scale (None for first dimension).
lower_float | np.ndarray | torch.Tensor, default=None: Lower bound for clamping, either prespecified or fitted.
upper_float | np.ndarray | torch.Tensor, default=None: Upper bound for clamping, either prespecified or fitted.
dims_tuple[int], default=None: Tuple specifying the dimensions to scale over.

mvpy.preprocessing.labelbinariser module#

A collection of estimators for binarising label data.

class mvpy.preprocessing.labelbinariser.LabelBinariser(neg_label: int = 0, pos_label: int = 1)[source]#

Bases: BaseEstimator

Class to create and handle multiclass and multifeature one-hot encodings.

For multiclass inputs, this produces a simple one hot encoding of shape (n_trials, n_classes).

For multifeature inputs, this produces a vectorised one hot encoding of shape (n_trials, n_features * n_classes) where there is one hot class per feature.

Parameters:

neg_labelint, default=0: Label to use for negatives.
pos_labelint, default=1: Label to use for positives.

Attributes:

neg_labelint, default=0: Label to use for negatives.
pos_labelint, default=1: Label to use for positives.
n_features_int: Number of unique features in y of shape (n_samples, n_features).
n_classes_List[int]: Number of unique classes per feature.
labels_List[List[Any]]: List including lists of original labels in y.
classes_List[List[Any]]: List including lists of class identities in y.
N_int | np.ndarray | torch.Tensor: Total number of classes (across features).
C_np.ndarray | torch.Tensor: Offsets for each unique feature in one-hot matrix of shape (n_features,).
map_L_to_C_List[Dict[Any, int]]: Lists containing each label->class mapping per feature.

Notes

Note that this always creates n_classes in one-hot encodings, even when n_classes=2. This is because, in some situations, it can be easier to handle the data when all classes are explicitly represented in the data.

Warning

Only the numpy backend supports string labels, as torch does not offer support for string type tensors. To avoid issues arising from this, stick to numerical labels unless you are certain to run analyses using only the numpy backend.

Examples

First, let’s consider one feature that has three classes.

>>> import torch
>>> from mvpy.estimators import LabelBinariser
>>> label = LabelBinariser().to_torch()
>>> y = torch.randint(0, 3, (100,))
>>> L = label.fit_transform(y)
>>> H = label.inverse_transform(L)
>>> print(y[0:5])
tensor([0, 1, 2, 1, 2])
>>> print(L[0:5])
tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [0, 1, 0],
        [0, 0, 1]])
>>> print(H[0:5])
tensor([0, 1, 2, 1, 2])

Second, let’s look at two features that have a different number of classes each.

>>> import torch
>>> from mvpy.estimators import LabelBinariser
>>> label = LabelBinariser().to_torch()
>>> y = torch.stack((torch.randint(10, 13, (50,)), torch.randint(20, 22, (50,))), dim = 1)
>>> L = label.fit_transform(y)
>>> H = label.inverse_transform(L)
>>> print(y[0:5])
tensor([[10, 21],
        [10, 20],
        [11, 21],
        [12, 21],
        [10, 20]])
>>> print(L[0:5])
tensor([[1, 0, 0, 0, 1],
        [1, 0, 0, 1, 0],
        [0, 1, 0, 0, 1],
        [0, 0, 1, 0, 1],
        [1, 0, 0, 1, 0]])
>>> print(H[0:5])
tensor([[10, 21],
        [10, 20],
        [11, 21],
        [12, 21],
        [10, 20]])

clone() → LabelBinariser[source]#

Obtain a clone of this class.

Returns:

binarisermvpy.estimators.LabelBinariser: The clone.

copy() → LabelBinariser[source]#

Obtain a copy of this class.

Returns:

binarisermvpy.estimators.LabelBinariser: The copy.

fit(y: ndarray | Tensor, *args: Any) → BaseEstimator[source]#

Fit the binariser.

Parameters:

ynp.ndarray | torch.Tensor: The data of shape (n_samples[, n_features]).
argsAny: Additional arguments.

Returns:

binarisersklearn.base.BaseEstimator: The fitted binariser.

fit_transform(y: ndarray | Tensor, *args: Any) → ndarray | Tensor[source]#

Fit and transform the data in one step.

Parameters:

ynp.ndarray | torch.Tensor: The data of shape (n_samples[, n_features]).
argsAny: Additional arguments.

Returns:

Lnp.ndarray | torch.Tensor: The binarised data of shape (n_samples, [n_features * ]n_classes).

inverse_transform(y: ndarray | Tensor, *args: Any) → ndarray | Tensor[source]#

Obtain labels from transformed data.

Parameters:

Lnp.ndarray | torch.Tensor: The binarised data of shape (n_samples, [n_features * ]n_classes).
argsAny: Additional arguments.

Returns:

ynp.ndarray | torch.Tensor: The labels of shape (n_samples, n_features).

to_numpy()[source]#

Select the numpy binariser. Note that this cannot be called for conversion.

Returns:

binarisermvpy.estimators._LabelBinariser_numpy: The numpy binariser.

to_torch()[source]#

Select the torch binariser. Note that this cannot be called for conversion.

Returns:

binarisermvpy.estimators._LabelBinariser_torch: The torch binariser.

transform(y: ndarray | Tensor, *args: Any) → ndarray | Tensor[source]#

Transform the data based on fitted binariser.

Parameters:

ynp.ndarray | torch.Tensor: The data of shape (n_samples[, n_features]).
argsAny: Additional arguments.

Returns:

Lnp.ndarray | torch.Tensor: The binarised data of shape (n_samples, [n_features * ]n_classes).

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

mvpy.preprocessing.robustscaler module#

A collection of estimators for robustly scaling data.

class mvpy.preprocessing.robustscaler.RobustScaler(with_centering: bool = True, with_scaling: bool = True, quantile_range: tuple[float, float] = (25.0, 75.0), dims: list | tuple | int | None = None)[source]#

Bases: BaseEstimator

Implements a robust scaler that is invariant to outliers.

By default, this scaler removes the median before scaling the data according to the interquartile range \([0.25, 0.75]\). This is useful because, unlike Scaler, it means that RobustScaler is robust to outliers that might affect a Scaler poorly.

Both centering and scaling are optional and can be turned on or off using with_centering and with_scaling.

Parameters:

with_centeringbool, default=True: If True, center the data before scaling.
with_scalingbool, default=True: If True, scale the data according to the quantiles.
quantile_rangetuple[float, float], default=(25.0, 75.0): Tuple describing the quantiles.
dimsint, list or tuple of ints, default=None: The dimensions over which to scale (None for first dimension).

Attributes:

with_centeringbool, default=True: If True, center the data before scaling.
with_scalingbool, default=True: If True, scale the data according to the quantiles.
quantile_rangetuple[float, float], default=(25.0, 75.0): Tuple describing the quantiles.
dimsint, list or tuple of ints, default=None: The dimensions over which to scale (None for first dimension).
dims_tuple[int], default=None: Tuple specifying the dimensions to scale over.
centre_torch.Tensor, default=None: The centre of each feature of shape X.
scale_torch.Tensor, default=None: The scale of each feature of shape ``X`.

mvpy.preprocessing.scaler module#

A collection of estimators for scaling data.

class mvpy.preprocessing.scaler.Scaler(with_mean: bool = True, with_std: bool = True, dims: list | tuple | int | None = None)[source]#

Bases: BaseEstimator

A standard scaler akin to sklearn.preprocessing.StandardScaler. See notes for some differences.

Parameters:

with_meanbool, default=True: If True, center the data before scaling.
with_stdbool, default=True: If True, scale the data to unit variance.
dimsint, list or tuple of ints, default=None: The dimensions over which to scale (None for first dimension).
copybool, default=False: If True, the data will be copied.

Attributes:

shape_tuple: The shape of the data.
mean_Union[np.ndarray, torch.Tensor]: The mean of the data.
var_Union[np.ndarray, torch.Tensor]: The variance of the data.
scale_Union[np.ndarray, torch.Tensor]: The scale of the data.

rac{x - mu}{sigma}

where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the data.

Examples

>>> import torch
>>> from mvpy.estimators import Scaler
>>> X = torch.normal(5, 10, (1000, 5))
>>> print(X.std(0))
tensor([ 9.7033, 10.2510, 10.2483, 10.1274, 10.2013])
>>> scaler = Scaler().fit(X)
>>> X_s = scaler.transform(X)
>>> print(X_s.std(0))
tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
>>> X_i = scaler.inverse_transform(X_s)
>>> print(X_i.std(0))
tensor([ 9.7033, 10.2510, 10.2483, 10.1274, 10.2013])

clone()[source]#

Obtain a clone of this class.

Returns:

Scaler: The clone.

copy()[source]#

Obtain a copy of this class.

Returns:

Scaler: The copy.

fit(X: ndarray | Tensor, *args: Any, sample_weight: ndarray | Tensor | None = None) → Any[source]#

Fit the scaler.

Parameters:

XUnion[np.ndarray, torch.Tensor]: The data.
argsAny: Additional arguments.
sample_weightUnion[np.ndarray, torch.Tensor], default=None: The sample weights.

fit_transform(X: ndarray | Tensor, *args: Any, sample_weight: ndarray | Tensor | None = None) → ndarray | Tensor[source]#

Fit and transform the data in one step.

Parameters:

XUnion[np.ndarray, torch.Tensor]: The data.
argsAny: Additional arguments.
sample_weightUnion[np.ndarray, torch.Tensor], default=None: The sample weights.

Returns:

Union[np.ndarray, torch.Tensor]: The transformed data.

inverse_transform(X: ndarray | Tensor, *args: Any) → ndarray | Tensor[source]#

Invert the transform of the data.

Parameters:

XUnion[np.ndarray, torch.Tensor]: The data.
argsAny: Additional arguments.

Returns:

Union[np.ndarray, torch.Tensor]: The inverse transformed data.

to_numpy()[source]#

Selet the numpy scaler. Note that this cannot be called for conversion.

Returns:

_Scaler_numpy: The numpy scaler.

to_torch()[source]#

Selet the torch scaler. Note that this cannot be called for conversion.

Returns:

_Scaler_torch: The torch scaler.

transform(X: ndarray | Tensor, *args: Any) → ndarray | Tensor[source]#

Transform the data using scaler.

Parameters:

XUnion[np.ndarray, torch.Tensor]: The data.
argsAny: Additional arguments.

Returns:

Union[np.ndarray, torch.Tensor]: The transformed data.

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

Module contents#

A collection of estimators for clamping data.

A collection of estimators for binarising label data.

A collection of estimators for robustly scaling data.

A collection of estimators for scaling data.

mvpy.preprocessing package#

Submodules#

mvpy.preprocessing.clamp module#

mvpy.preprocessing.labelbinariser module#

mvpy.preprocessing.robustscaler module#

mvpy.preprocessing.scaler module#

Module contents#

This Page