mvpy.preprocessing package#

Submodules#

mvpy.preprocessing.clamp module#

A collection of estimators for clamping data.

class mvpy.preprocessing.clamp.Clamp(lower: float | None = None, upper: float | None = None, method: str = 'iqr', k: float | None = None, eps: float = 1e-09, dims: list | tuple | int | None = None)[source]#

Bases: BaseEstimator

Implements a clamp to handle extreme values.

Generally, this will clamp data \(X\) to lower and upper bounds defined by lower and upper whenever they are exceeded.

This can be useful for dealing with outliers: For example, in M-/EGG data that was minimally preprocessed, this may be used to curb EOG artifacts easily without removing time points or trials.

By default, both lower and upper will be None. This constitutes a special case where the bounds will then be fit directly to the data. There are three different ways of fitting bounds, controlled by method:

  1. iqr: This will compute the inter-quartile range \([0.25, 0.75]\) and clamp data where \(X\notin [\textrm{median}(X) - k L, \textrm{median}(X) + k U]\).

  2. quantile: This will clamp data outside of the quantiles given by \([k, 1 - k]\).

  3. mad: This will clamp data at \(\textrm{median}(X)\pm k \textrm{MAD}\) where MAD are median absolute deviations.

If only one of the two bounds is None instead, the unspecified bound will be interpreted as meaning no clamping in this direction to be desired.

Parameters:
lowerOptional[float], default=None

Lower bound for clamping. If None, no lower bound is applied.

upperOptional[float], default=None

Upper bound for clamping, If None, no upper bound is applied.

method{‘iqr’, ‘quantile’, ‘mad’}, default=’iqr’

If both lower and upper are None, what method to use for fitting bounds?

kOptional[float], default=None

For method iqr, scale the \([0.25, 0.75]\) quantiles by \(k\) (with default=1.5). For method quantile, clamp tails outside \([k, 1 - k]\) (with default = 0.05). For method mad, scale the median absolute deviation by \(k\) (with default=3.0). Otherwise unused.

epsfloat, default=1e-9

When checking span correctness, epsilon to apply as jitter.

dimsint, list or tuple of ints, default=None

The dimensions over which to scale (None for first dimension).

Attributes:
lowerOptional[float], default=None

Lower bound for clamping. If None, no lower bound is applied.

upperOptional[float], default=None

Upper bound for clamping, If None, no upper bound is applied.

method{‘iqr’, ‘quantile’, ‘mad’}, default=’iqr’

If both lower and upper are None, what method to use for fitting bounds?

kOptional[float], default=None

For method iqr, scale the \([0.25, 0.75]\) quantiles by \(k\) (with default=1.5). For method quantile, clamp tails outside \([k, 1 - k]\) (with default = 0.05). For method mad, scale the median absolute deviation by \(k\) (with default=3.0). Otherwise unused.

epsfloat, default=1e-9

When checking span correctness, epsilon to apply as jitter.

dimsint, list or tuple of ints, default=None

The dimensions over which to scale (None for first dimension).

lower_float | np.ndarray | torch.Tensor, default=None

Lower bound for clamping, either prespecified or fitted.

upper_float | np.ndarray | torch.Tensor, default=None

Upper bound for clamping, either prespecified or fitted.

dims_tuple[int], default=None

Tuple specifying the dimensions to scale over.

Examples

>>> import torch
>>> from mvpy.preprocessing import Clamp
>>> X = torch.normal(0, 1, (1000, 5))
>>> X[500,0] = 1e3
>>> X.max(0).values
tensor([10.0000,  3.9375,  3.2070,  3.0591,  3.0165])
>>> Z = Clamp().fit_transform(X)
>>> Z.max(0).values
tensor([2.6926, 2.7263, 2.6343, 2.6616, 2.5378])
>>> Z = Clamp(upper = 5.0).fit_transform(X)
>>> Z.max(0).values
tensor([5.0000, 3.9375, 3.2070, 3.0591, 3.0165])
clone() Clamp[source]#

Obtain a clone of this class.

Returns:
clampClamp

The cloned clamp.

copy() Clamp[source]#

Obtain a copy of this class.

Returns:
clampClamp

The copied clamp.

fit(X: ndarray | Tensor, *args: Any) Clamp[source]#

Fit the clamp.

Parameters:
Xnp.ndarray | torch.Tensor

The data of arbitrary shape.

argsAny

Additional arguments.

Returns:
clampsklearn.base.BaseEstimator

The fitted clamp.

fit_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Fit and transform the data in one step.

Parameters:
Xnp.ndarray | torch.Tensor

The data of shape X.

argsAny

Additional arguments.

Returns:
Znp.ndarray | torch.Tensor

The transformed data of shape X.

inverse_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Invert the transform of the data.

Parameters:
Xnp.ndarray | torch.Tensor

The data of shape X.

argsAny

Additional arguments.

Returns:
Xnp.ndarray | torch.Tensor

The inverse transformed data of shape X.

Warning

Clamping cannot be inverse transformed. Consequently, this returns the clamped values in \(X\) as is.

to_numpy() _Clamp_numpy[source]#

Select the numpy backend. Note that this cannot be called for conversion.

Returns:
clamp_Clamp_numpy

The clamp using the numpy backend.

to_torch() _Clamp_torch[source]#

Select the torch backend. Note that this cannot be called for conversion.

Returns:
clamp_Clamp_torch

The clamp using the torch backend.

transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Transform the data using the clamp.

Parameters:
Xnp.ndarray | torch.Tensor

The data of shape X.

argsAny

Additional arguments.

Returns:
Znp.ndarray | torch.Tensor

The transformed data of shape X.

Operator interface.

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

mvpy.preprocessing.labelbinariser module#

A collection of estimators for binarising label data.

class mvpy.preprocessing.labelbinariser.LabelBinariser(neg_label: int = 0, pos_label: int = 1)[source]#

Bases: BaseEstimator

Class to create and handle multiclass and multifeature one-hot encodings.

For multiclass inputs, this produces a simple one hot encoding of shape (n_trials, n_classes).

For multifeature inputs, this produces a vectorised one hot encoding of shape (n_trials, n_features * n_classes) where there is one hot class per feature.

Parameters:
neg_labelint, default=0

Label to use for negatives.

pos_labelint, default=1

Label to use for positives.

Attributes:
neg_labelint, default=0

Label to use for negatives.

pos_labelint, default=1

Label to use for positives.

n_features_int

Number of unique features in y of shape (n_samples, n_features).

n_classes_List[int]

Number of unique classes per feature.

labels_List[List[Any]]

List including lists of original labels in y.

classes_List[List[Any]]

List including lists of class identities in y.

N_int | np.ndarray | torch.Tensor

Total number of classes (across features).

C_np.ndarray | torch.Tensor

Offsets for each unique feature in one-hot matrix of shape (n_features,).

map_L_to_C_List[Dict[Any, int]]

Lists containing each label->class mapping per feature.

Notes

Note that this always creates n_classes in one-hot encodings, even when n_classes=2. This is because, in some situations, it can be easier to handle the data when all classes are explicitly represented in the data.

Warning

Only the numpy backend supports string labels, as torch does not offer support for string type tensors. To avoid issues arising from this, stick to numerical labels unless you are certain to run analyses using only the numpy backend.

Examples

First, let’s consider one feature that has three classes.

>>> import torch
>>> from mvpy.estimators import LabelBinariser
>>> label = LabelBinariser().to_torch()
>>> y = torch.randint(0, 3, (100,))
>>> L = label.fit_transform(y)
>>> H = label.inverse_transform(L)
>>> print(y[0:5])
tensor([0, 1, 2, 1, 2])
>>> print(L[0:5])
tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [0, 1, 0],
        [0, 0, 1]])
>>> print(H[0:5])
tensor([0, 1, 2, 1, 2])

Second, let’s look at two features that have a different number of classes each.

>>> import torch
>>> from mvpy.estimators import LabelBinariser
>>> label = LabelBinariser().to_torch()
>>> y = torch.stack((torch.randint(10, 13, (50,)), torch.randint(20, 22, (50,))), dim = 1)
>>> L = label.fit_transform(y)
>>> H = label.inverse_transform(L)
>>> print(y[0:5])
tensor([[10, 21],
        [10, 20],
        [11, 21],
        [12, 21],
        [10, 20]])
>>> print(L[0:5])
tensor([[1, 0, 0, 0, 1],
        [1, 0, 0, 1, 0],
        [0, 1, 0, 0, 1],
        [0, 0, 1, 0, 1],
        [1, 0, 0, 1, 0]])
>>> print(H[0:5])
tensor([[10, 21],
        [10, 20],
        [11, 21],
        [12, 21],
        [10, 20]])
clone() LabelBinariser[source]#

Obtain a clone of this class.

Returns:
binarisermvpy.estimators.LabelBinariser

The clone.

copy() LabelBinariser[source]#

Obtain a copy of this class.

Returns:
binarisermvpy.estimators.LabelBinariser

The copy.

fit(y: ndarray | Tensor, *args: Any) BaseEstimator[source]#

Fit the binariser.

Parameters:
ynp.ndarray | torch.Tensor

The data of shape (n_samples[, n_features]).

argsAny

Additional arguments.

Returns:
binarisersklearn.base.BaseEstimator

The fitted binariser.

fit_transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Fit and transform the data in one step.

Parameters:
ynp.ndarray | torch.Tensor

The data of shape (n_samples[, n_features]).

argsAny

Additional arguments.

Returns:
Lnp.ndarray | torch.Tensor

The binarised data of shape (n_samples, [n_features * ]n_classes).

inverse_transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Obtain labels from transformed data.

Parameters:
Lnp.ndarray | torch.Tensor

The binarised data of shape (n_samples, [n_features * ]n_classes).

argsAny

Additional arguments.

Returns:
ynp.ndarray | torch.Tensor

The labels of shape (n_samples, n_features).

to_numpy()[source]#

Select the numpy binariser. Note that this cannot be called for conversion.

Returns:
binarisermvpy.estimators._LabelBinariser_numpy

The numpy binariser.

to_torch()[source]#

Select the torch binariser. Note that this cannot be called for conversion.

Returns:
binarisermvpy.estimators._LabelBinariser_torch

The torch binariser.

transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Transform the data based on fitted binariser.

Parameters:
ynp.ndarray | torch.Tensor

The data of shape (n_samples[, n_features]).

argsAny

Additional arguments.

Returns:
Lnp.ndarray | torch.Tensor

The binarised data of shape (n_samples, [n_features * ]n_classes).

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

mvpy.preprocessing.robustscaler module#

A collection of estimators for robustly scaling data.

class mvpy.preprocessing.robustscaler.RobustScaler(with_centering: bool = True, with_scaling: bool = True, quantile_range: tuple[float, float] = (25.0, 75.0), dims: list | tuple | int | None = None)[source]#

Bases: BaseEstimator

Implements a robust scaler that is invariant to outliers.

By default, this scaler removes the median before scaling the data according to the interquartile range \([0.25, 0.75]\). This is useful because, unlike Scaler, it means that RobustScaler is robust to outliers that might affect a Scaler poorly.

Both centering and scaling are optional and can be turned on or off using with_centering and with_scaling.

Parameters:
with_centeringbool, default=True

If True, center the data before scaling.

with_scalingbool, default=True

If True, scale the data according to the quantiles.

quantile_rangetuple[float, float], default=(25.0, 75.0)

Tuple describing the quantiles.

dimsint, list or tuple of ints, default=None

The dimensions over which to scale (None for first dimension).

Attributes:
with_centeringbool, default=True

If True, center the data before scaling.

with_scalingbool, default=True

If True, scale the data according to the quantiles.

quantile_rangetuple[float, float], default=(25.0, 75.0)

Tuple describing the quantiles.

dimsint, list or tuple of ints, default=None

The dimensions over which to scale (None for first dimension).

dims_tuple[int], default=None

Tuple specifying the dimensions to scale over.

centre_torch.Tensor, default=None

The centre of each feature of shape X.

scale_torch.Tensor, default=None

The scale of each feature of shape ``X`.

See also

mvpy.preprocessing.Scaler

An alternative scaler that normalises data to zero mean and unit variance.

mvpy.preprocessing.Clamp

A complementary class that implements clamping data at specific values.

Examples

>>> import torch
>>> from mvpy.preprocessing import RobustScaler
>>> scaler = RobustScaler().to_torch()
>>> X = torch.normal(5, 10, (1000, 5))
>>> X[500,0] = 1e3
>>> X.std(0)
tensor([32.9122,  9.9615, 10.1481, 10.1058,  9.7468])
>>> Z = scaler.fit_transform(X)
>>> Z.std(0)
tensor([2.7348, 0.7351, 0.7464, 0.7609, 0.7154])
>>> H = scaler.inverse_transform(Z)
>>> H.std(0)
tensor([32.9122,  9.9615, 10.1481, 10.1058,  9.7468])
clone() RobustScaler[source]#

Obtain a clone of this class.

Returns:
scalerRobustScaler

The cloned robust scaler.

copy() RobustScaler[source]#

Obtain a copy of this class.

Returns:
scalerRobustScaler

The copied robust scaler.

fit(X: ndarray | Tensor, *args: Any) RobustScaler[source]#

Fit the scaler.

Parameters:
Xnp.ndarray | torch.Tensor

The data of arbitrary shape.

argsAny

Additional arguments.

Returns:
scalersklearn.base.BaseEstimator

The fitted scaler.

fit_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Fit and transform the data in one step.

Parameters:
Xnp.ndarray | torch.Tensor

The data of shape X.

argsAny

Additional arguments.

Returns:
Znp.ndarray | torch.Tensor

The transformed data of shape X.

inverse_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Invert the transform of the data.

Parameters:
Xnp.ndarray | torch.Tensor

The data of shape X.

argsAny

Additional arguments.

Returns:
Xnp.ndarray | torch.Tensor

The inverse transformed data of shape X.

to_numpy() _RobustScaler_numpy[source]#

Select the numpy backend. Note that this cannot be called for conversion.

Returns:
scaler_RobustScaler_numpy

The robust scaler using the numpy backend.

to_torch() _RobustScaler_torch[source]#

Select the torch backend. Note that this cannot be called for conversion.

Returns:
scaler_RobustScaler_torch

The robust scaler using the torch backend.

transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Transform the data using scaler.

Parameters:
Xnp.ndarray | torch.Tensor

The data of shape X.

argsAny

Additional arguments.

Returns:
Znp.ndarray | torch.Tensor

The transformed data of shape X.

This module provides access to the mathematical functions

Operator interface.

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

mvpy.preprocessing.scaler module#

A collection of estimators for scaling data.

class mvpy.preprocessing.scaler.Scaler(with_mean: bool = True, with_std: bool = True, dims: list | tuple | int | None = None)[source]#

Bases: BaseEstimator

A standard scaler akin to sklearn.preprocessing.StandardScaler. See notes for some differences.

Parameters:
with_meanbool, default=True

If True, center the data before scaling.

with_stdbool, default=True

If True, scale the data to unit variance.

dimsint, list or tuple of ints, default=None

The dimensions over which to scale (None for first dimension).

copybool, default=False

If True, the data will be copied.

Attributes:
shape_tuple

The shape of the data.

mean_Union[np.ndarray, torch.Tensor]

The mean of the data.

var_Union[np.ndarray, torch.Tensor]

The variance of the data.

scale_Union[np.ndarray, torch.Tensor]

The scale of the data.

rac{x - mu}{sigma}

where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the data.

Examples

>>> import torch
>>> from mvpy.estimators import Scaler
>>> X = torch.normal(5, 10, (1000, 5))
>>> print(X.std(0))
tensor([ 9.7033, 10.2510, 10.2483, 10.1274, 10.2013])
>>> scaler = Scaler().fit(X)
>>> X_s = scaler.transform(X)
>>> print(X_s.std(0))
tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
>>> X_i = scaler.inverse_transform(X_s)
>>> print(X_i.std(0))
tensor([ 9.7033, 10.2510, 10.2483, 10.1274, 10.2013])
clone()[source]#

Obtain a clone of this class.

Returns:
Scaler

The clone.

copy()[source]#

Obtain a copy of this class.

Returns:
Scaler

The copy.

fit(X: ndarray | Tensor, *args: Any, sample_weight: ndarray | Tensor | None = None) Any[source]#

Fit the scaler.

Parameters:
XUnion[np.ndarray, torch.Tensor]

The data.

argsAny

Additional arguments.

sample_weightUnion[np.ndarray, torch.Tensor], default=None

The sample weights.

fit_transform(X: ndarray | Tensor, *args: Any, sample_weight: ndarray | Tensor | None = None) ndarray | Tensor[source]#

Fit and transform the data in one step.

Parameters:
XUnion[np.ndarray, torch.Tensor]

The data.

argsAny

Additional arguments.

sample_weightUnion[np.ndarray, torch.Tensor], default=None

The sample weights.

Returns:
Union[np.ndarray, torch.Tensor]

The transformed data.

inverse_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Invert the transform of the data.

Parameters:
XUnion[np.ndarray, torch.Tensor]

The data.

argsAny

Additional arguments.

Returns:
Union[np.ndarray, torch.Tensor]

The inverse transformed data.

to_numpy()[source]#

Selet the numpy scaler. Note that this cannot be called for conversion.

Returns:
_Scaler_numpy

The numpy scaler.

to_torch()[source]#

Selet the torch scaler. Note that this cannot be called for conversion.

Returns:
_Scaler_torch

The torch scaler.

transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Transform the data using scaler.

Parameters:
XUnion[np.ndarray, torch.Tensor]

The data.

argsAny

Additional arguments.

Returns:
Union[np.ndarray, torch.Tensor]

The transformed data.

Configure global settings and get information about the working environment.

The torch package contains data structures for multi-dimensional

Module contents#

A collection of estimators for clamping data.

A collection of estimators for binarising label data.

A collection of estimators for robustly scaling data.

A collection of estimators for scaling data.