mvpy.preprocessing package#
Submodules#
mvpy.preprocessing.clamp module#
A collection of estimators for clamping data.
- class mvpy.preprocessing.clamp.Clamp(lower: float | None = None, upper: float | None = None, method: str = 'iqr', k: float | None = None, eps: float = 1e-09, dims: list | tuple | int | None = None)[source]#
Bases:
BaseEstimator
Implements a clamp to handle extreme values.
Generally, this will clamp data \(X\) to lower and upper bounds defined by
lower
andupper
whenever they are exceeded.This can be useful for dealing with outliers: For example, in M-/EGG data that was minimally preprocessed, this may be used to curb EOG artifacts easily without removing time points or trials.
By default, both
lower
andupper
will beNone
. This constitutes a special case where the bounds will then be fit directly to the data. There are three different ways of fitting bounds, controlled bymethod
:iqr
: This will compute the inter-quartile range \([0.25, 0.75]\) and clamp data where \(X\notin [\textrm{median}(X) - k L, \textrm{median}(X) + k U]\).quantile
: This will clamp data outside of the quantiles given by \([k, 1 - k]\).mad
: This will clamp data at \(\textrm{median}(X)\pm k \textrm{MAD}\) where MAD are median absolute deviations.
If only one of the two bounds is
None
instead, the unspecified bound will be interpreted as meaning no clamping in this direction to be desired.- Parameters:
- lowerOptional[float], default=None
Lower bound for clamping. If
None
, no lower bound is applied.- upperOptional[float], default=None
Upper bound for clamping, If
None
, no upper bound is applied.- method{‘iqr’, ‘quantile’, ‘mad’}, default=’iqr’
If both
lower
andupper
areNone
, what method to use for fitting bounds?- kOptional[float], default=None
For
method
iqr
, scale the \([0.25, 0.75]\) quantiles by \(k\) (withdefault=1.5
). Formethod
quantile
, clamp tails outside \([k, 1 - k]\) (withdefault = 0.05
). Formethod
mad
, scale the median absolute deviation by \(k\) (withdefault=3.0
). Otherwise unused.- epsfloat, default=1e-9
When checking span correctness, epsilon to apply as jitter.
- dimsint, list or tuple of ints, default=None
The dimensions over which to scale (None for first dimension).
- Attributes:
- lowerOptional[float], default=None
Lower bound for clamping. If
None
, no lower bound is applied.- upperOptional[float], default=None
Upper bound for clamping, If
None
, no upper bound is applied.- method{‘iqr’, ‘quantile’, ‘mad’}, default=’iqr’
If both
lower
andupper
areNone
, what method to use for fitting bounds?- kOptional[float], default=None
For
method
iqr
, scale the \([0.25, 0.75]\) quantiles by \(k\) (withdefault=1.5
). Formethod
quantile
, clamp tails outside \([k, 1 - k]\) (withdefault = 0.05
). Formethod
mad
, scale the median absolute deviation by \(k\) (withdefault=3.0
). Otherwise unused.- epsfloat, default=1e-9
When checking span correctness, epsilon to apply as jitter.
- dimsint, list or tuple of ints, default=None
The dimensions over which to scale (None for first dimension).
- lower_float | np.ndarray | torch.Tensor, default=None
Lower bound for clamping, either prespecified or fitted.
- upper_float | np.ndarray | torch.Tensor, default=None
Upper bound for clamping, either prespecified or fitted.
- dims_tuple[int], default=None
Tuple specifying the dimensions to scale over.
See also
mvpy.preprocessing.Scaler
,mvpy.preprocessing.RobustScaler
Complementary scalers.
Examples
>>> import torch >>> from mvpy.preprocessing import Clamp >>> X = torch.normal(0, 1, (1000, 5)) >>> X[500,0] = 1e3 >>> X.max(0).values tensor([10.0000, 3.9375, 3.2070, 3.0591, 3.0165]) >>> Z = Clamp().fit_transform(X) >>> Z.max(0).values tensor([2.6926, 2.7263, 2.6343, 2.6616, 2.5378]) >>> Z = Clamp(upper = 5.0).fit_transform(X) >>> Z.max(0).values tensor([5.0000, 3.9375, 3.2070, 3.0591, 3.0165])
- fit(X: ndarray | Tensor, *args: Any) Clamp [source]#
Fit the clamp.
- Parameters:
- Xnp.ndarray | torch.Tensor
The data of arbitrary shape.
- argsAny
Additional arguments.
- Returns:
- clampsklearn.base.BaseEstimator
The fitted clamp.
- fit_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Fit and transform the data in one step.
- Parameters:
- Xnp.ndarray | torch.Tensor
The data of shape
X
.- argsAny
Additional arguments.
- Returns:
- Znp.ndarray | torch.Tensor
The transformed data of shape
X
.
- inverse_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Invert the transform of the data.
- Parameters:
- Xnp.ndarray | torch.Tensor
The data of shape
X
.- argsAny
Additional arguments.
- Returns:
- Xnp.ndarray | torch.Tensor
The inverse transformed data of shape
X
.
Warning
Clamping cannot be inverse transformed. Consequently, this returns the clamped values in \(X\) as is.
- to_numpy() _Clamp_numpy [source]#
Select the numpy backend. Note that this cannot be called for conversion.
- Returns:
- clamp_Clamp_numpy
The clamp using the numpy backend.
Operator interface.
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional
mvpy.preprocessing.labelbinariser module#
A collection of estimators for binarising label data.
- class mvpy.preprocessing.labelbinariser.LabelBinariser(neg_label: int = 0, pos_label: int = 1)[source]#
Bases:
BaseEstimator
Class to create and handle multiclass and multifeature one-hot encodings.
For multiclass inputs, this produces a simple one hot encoding of shape
(n_trials, n_classes)
.For multifeature inputs, this produces a vectorised one hot encoding of shape
(n_trials, n_features * n_classes)
where there is one hot class per feature.- Parameters:
- neg_labelint, default=0
Label to use for negatives.
- pos_labelint, default=1
Label to use for positives.
- Attributes:
- neg_labelint, default=0
Label to use for negatives.
- pos_labelint, default=1
Label to use for positives.
- n_features_int
Number of unique features in y of shape
(n_samples, n_features)
.- n_classes_List[int]
Number of unique classes per feature.
- labels_List[List[Any]]
List including lists of original labels in y.
- classes_List[List[Any]]
List including lists of class identities in y.
- N_int | np.ndarray | torch.Tensor
Total number of classes (across features).
- C_np.ndarray | torch.Tensor
Offsets for each unique feature in one-hot matrix of shape
(n_features,)
.- map_L_to_C_List[Dict[Any, int]]
Lists containing each label->class mapping per feature.
Notes
Note that this always creates
n_classes
in one-hot encodings, even whenn_classes=2
. This is because, in some situations, it can be easier to handle the data when all classes are explicitly represented in the data.Warning
Only the numpy backend supports string labels, as torch does not offer support for string type tensors. To avoid issues arising from this, stick to numerical labels unless you are certain to run analyses using only the numpy backend.
Examples
First, let’s consider one feature that has three classes.
>>> import torch >>> from mvpy.estimators import LabelBinariser >>> label = LabelBinariser().to_torch() >>> y = torch.randint(0, 3, (100,)) >>> L = label.fit_transform(y) >>> H = label.inverse_transform(L) >>> print(y[0:5]) tensor([0, 1, 2, 1, 2]) >>> print(L[0:5]) tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0], [0, 0, 1]]) >>> print(H[0:5]) tensor([0, 1, 2, 1, 2])
Second, let’s look at two features that have a different number of classes each.
>>> import torch >>> from mvpy.estimators import LabelBinariser >>> label = LabelBinariser().to_torch() >>> y = torch.stack((torch.randint(10, 13, (50,)), torch.randint(20, 22, (50,))), dim = 1) >>> L = label.fit_transform(y) >>> H = label.inverse_transform(L) >>> print(y[0:5]) tensor([[10, 21], [10, 20], [11, 21], [12, 21], [10, 20]]) >>> print(L[0:5]) tensor([[1, 0, 0, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0]]) >>> print(H[0:5]) tensor([[10, 21], [10, 20], [11, 21], [12, 21], [10, 20]])
- clone() LabelBinariser [source]#
Obtain a clone of this class.
- Returns:
- binarisermvpy.estimators.LabelBinariser
The clone.
- copy() LabelBinariser [source]#
Obtain a copy of this class.
- Returns:
- binarisermvpy.estimators.LabelBinariser
The copy.
- fit(y: ndarray | Tensor, *args: Any) BaseEstimator [source]#
Fit the binariser.
- Parameters:
- ynp.ndarray | torch.Tensor
The data of shape
(n_samples[, n_features])
.- argsAny
Additional arguments.
- Returns:
- binarisersklearn.base.BaseEstimator
The fitted binariser.
- fit_transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Fit and transform the data in one step.
- Parameters:
- ynp.ndarray | torch.Tensor
The data of shape
(n_samples[, n_features])
.- argsAny
Additional arguments.
- Returns:
- Lnp.ndarray | torch.Tensor
The binarised data of shape
(n_samples, [n_features * ]n_classes)
.
- inverse_transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Obtain labels from transformed data.
- Parameters:
- Lnp.ndarray | torch.Tensor
The binarised data of shape
(n_samples, [n_features * ]n_classes)
.- argsAny
Additional arguments.
- Returns:
- ynp.ndarray | torch.Tensor
The labels of shape
(n_samples, n_features)
.
- to_numpy()[source]#
Select the numpy binariser. Note that this cannot be called for conversion.
- Returns:
- binarisermvpy.estimators._LabelBinariser_numpy
The numpy binariser.
- to_torch()[source]#
Select the torch binariser. Note that this cannot be called for conversion.
- Returns:
- binarisermvpy.estimators._LabelBinariser_torch
The torch binariser.
- transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Transform the data based on fitted binariser.
- Parameters:
- ynp.ndarray | torch.Tensor
The data of shape
(n_samples[, n_features])
.- argsAny
Additional arguments.
- Returns:
- Lnp.ndarray | torch.Tensor
The binarised data of shape
(n_samples, [n_features * ]n_classes)
.
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional
mvpy.preprocessing.robustscaler module#
A collection of estimators for robustly scaling data.
- class mvpy.preprocessing.robustscaler.RobustScaler(with_centering: bool = True, with_scaling: bool = True, quantile_range: tuple[float, float] = (25.0, 75.0), dims: list | tuple | int | None = None)[source]#
Bases:
BaseEstimator
Implements a robust scaler that is invariant to outliers.
By default, this scaler removes the median before scaling the data according to the interquartile range \([0.25, 0.75]\). This is useful because, unlike
Scaler
, it means thatRobustScaler
is robust to outliers that might affect aScaler
poorly.Both centering and scaling are optional and can be turned on or off using
with_centering
andwith_scaling
.- Parameters:
- with_centeringbool, default=True
If True, center the data before scaling.
- with_scalingbool, default=True
If True, scale the data according to the quantiles.
- quantile_rangetuple[float, float], default=(25.0, 75.0)
Tuple describing the quantiles.
- dimsint, list or tuple of ints, default=None
The dimensions over which to scale (
None
for first dimension).
- Attributes:
- with_centeringbool, default=True
If True, center the data before scaling.
- with_scalingbool, default=True
If True, scale the data according to the quantiles.
- quantile_rangetuple[float, float], default=(25.0, 75.0)
Tuple describing the quantiles.
- dimsint, list or tuple of ints, default=None
The dimensions over which to scale (
None
for first dimension).- dims_tuple[int], default=None
Tuple specifying the dimensions to scale over.
- centre_torch.Tensor, default=None
The centre of each feature of shape
X
.- scale_torch.Tensor, default=None
The scale of each feature of shape ``X`.
See also
mvpy.preprocessing.Scaler
An alternative scaler that normalises data to zero mean and unit variance.
mvpy.preprocessing.Clamp
A complementary class that implements clamping data at specific values.
Examples
>>> import torch >>> from mvpy.preprocessing import RobustScaler >>> scaler = RobustScaler().to_torch() >>> X = torch.normal(5, 10, (1000, 5)) >>> X[500,0] = 1e3 >>> X.std(0) tensor([32.9122, 9.9615, 10.1481, 10.1058, 9.7468]) >>> Z = scaler.fit_transform(X) >>> Z.std(0) tensor([2.7348, 0.7351, 0.7464, 0.7609, 0.7154]) >>> H = scaler.inverse_transform(Z) >>> H.std(0) tensor([32.9122, 9.9615, 10.1481, 10.1058, 9.7468])
- clone() RobustScaler [source]#
Obtain a clone of this class.
- Returns:
- scalerRobustScaler
The cloned robust scaler.
- copy() RobustScaler [source]#
Obtain a copy of this class.
- Returns:
- scalerRobustScaler
The copied robust scaler.
- fit(X: ndarray | Tensor, *args: Any) RobustScaler [source]#
Fit the scaler.
- Parameters:
- Xnp.ndarray | torch.Tensor
The data of arbitrary shape.
- argsAny
Additional arguments.
- Returns:
- scalersklearn.base.BaseEstimator
The fitted scaler.
- fit_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Fit and transform the data in one step.
- Parameters:
- Xnp.ndarray | torch.Tensor
The data of shape
X
.- argsAny
Additional arguments.
- Returns:
- Znp.ndarray | torch.Tensor
The transformed data of shape
X
.
- inverse_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Invert the transform of the data.
- Parameters:
- Xnp.ndarray | torch.Tensor
The data of shape
X
.- argsAny
Additional arguments.
- Returns:
- Xnp.ndarray | torch.Tensor
The inverse transformed data of shape
X
.
- to_numpy() _RobustScaler_numpy [source]#
Select the numpy backend. Note that this cannot be called for conversion.
- Returns:
- scaler_RobustScaler_numpy
The robust scaler using the numpy backend.
This module provides access to the mathematical functions
Operator interface.
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional
mvpy.preprocessing.scaler module#
A collection of estimators for scaling data.
- class mvpy.preprocessing.scaler.Scaler(with_mean: bool = True, with_std: bool = True, dims: list | tuple | int | None = None)[source]#
Bases:
BaseEstimator
A standard scaler akin to sklearn.preprocessing.StandardScaler. See notes for some differences.
- Parameters:
- with_meanbool, default=True
If True, center the data before scaling.
- with_stdbool, default=True
If True, scale the data to unit variance.
- dimsint, list or tuple of ints, default=None
The dimensions over which to scale (None for first dimension).
- copybool, default=False
If True, the data will be copied.
- Attributes:
- shape_tuple
The shape of the data.
- mean_Union[np.ndarray, torch.Tensor]
The mean of the data.
- var_Union[np.ndarray, torch.Tensor]
The variance of the data.
- scale_Union[np.ndarray, torch.Tensor]
The scale of the data.
rac{x - mu}{sigma}
where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the data.
Examples
>>> import torch >>> from mvpy.estimators import Scaler >>> X = torch.normal(5, 10, (1000, 5)) >>> print(X.std(0)) tensor([ 9.7033, 10.2510, 10.2483, 10.1274, 10.2013]) >>> scaler = Scaler().fit(X) >>> X_s = scaler.transform(X) >>> print(X_s.std(0)) tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000]) >>> X_i = scaler.inverse_transform(X_s) >>> print(X_i.std(0)) tensor([ 9.7033, 10.2510, 10.2483, 10.1274, 10.2013])
- fit(X: ndarray | Tensor, *args: Any, sample_weight: ndarray | Tensor | None = None) Any [source]#
Fit the scaler.
- Parameters:
- XUnion[np.ndarray, torch.Tensor]
The data.
- argsAny
Additional arguments.
- sample_weightUnion[np.ndarray, torch.Tensor], default=None
The sample weights.
- fit_transform(X: ndarray | Tensor, *args: Any, sample_weight: ndarray | Tensor | None = None) ndarray | Tensor [source]#
Fit and transform the data in one step.
- Parameters:
- XUnion[np.ndarray, torch.Tensor]
The data.
- argsAny
Additional arguments.
- sample_weightUnion[np.ndarray, torch.Tensor], default=None
The sample weights.
- Returns:
- Union[np.ndarray, torch.Tensor]
The transformed data.
- inverse_transform(X: ndarray | Tensor, *args: Any) ndarray | Tensor [source]#
Invert the transform of the data.
- Parameters:
- XUnion[np.ndarray, torch.Tensor]
The data.
- argsAny
Additional arguments.
- Returns:
- Union[np.ndarray, torch.Tensor]
The inverse transformed data.
- to_numpy()[source]#
Selet the numpy scaler. Note that this cannot be called for conversion.
- Returns:
- _Scaler_numpy
The numpy scaler.
Configure global settings and get information about the working environment.
The torch package contains data structures for multi-dimensional
Module contents#
A collection of estimators for clamping data.
A collection of estimators for binarising label data.
A collection of estimators for robustly scaling data.
A collection of estimators for scaling data.