LabelBinariser#

class mvpy.preprocessing.LabelBinariser(neg_label: int = 0, pos_label: int = 1)[source]#

Class to create and handle multiclass and multifeature one-hot encodings.

For multiclass inputs, this produces a simple one hot encoding of shape (n_trials, n_classes).

For multifeature inputs, this produces a vectorised one hot encoding of shape (n_trials, n_features * n_classes) where there is one hot class per feature.

Parameters:
neg_labelint, default=0

Label to use for negatives.

pos_labelint, default=1

Label to use for positives.

Attributes:
neg_labelint, default=0

Label to use for negatives.

pos_labelint, default=1

Label to use for positives.

n_features_int

Number of unique features in y of shape (n_samples, n_features).

n_classes_List[int]

Number of unique classes per feature.

labels_List[List[Any]]

List including lists of original labels in y.

classes_List[List[Any]]

List including lists of class identities in y.

N_int | np.ndarray | torch.Tensor

Total number of classes (across features).

C_np.ndarray | torch.Tensor

Offsets for each unique feature in one-hot matrix of shape (n_features,).

map_L_to_C_List[Dict[Any, int]]

Lists containing each label->class mapping per feature.

Notes

Note that this always creates n_classes in one-hot encodings, even when n_classes=2. This is because, in some situations, it can be easier to handle the data when all classes are explicitly represented in the data.

Warning

Only the numpy backend supports string labels, as torch does not offer support for string type tensors. To avoid issues arising from this, stick to numerical labels unless you are certain to run analyses using only the numpy backend.

Examples

First, let’s consider one feature that has three classes.

>>> import torch
>>> from mvpy.estimators import LabelBinariser
>>> label = LabelBinariser().to_torch()
>>> y = torch.randint(0, 3, (100,))
>>> L = label.fit_transform(y)
>>> H = label.inverse_transform(L)
>>> print(y[0:5])
tensor([0, 1, 2, 1, 2])
>>> print(L[0:5])
tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [0, 1, 0],
        [0, 0, 1]])
>>> print(H[0:5])
tensor([0, 1, 2, 1, 2])

Second, let’s look at two features that have a different number of classes each.

>>> import torch
>>> from mvpy.estimators import LabelBinariser
>>> label = LabelBinariser().to_torch()
>>> y = torch.stack((torch.randint(10, 13, (50,)), torch.randint(20, 22, (50,))), dim = 1)
>>> L = label.fit_transform(y)
>>> H = label.inverse_transform(L)
>>> print(y[0:5])
tensor([[10, 21],
        [10, 20],
        [11, 21],
        [12, 21],
        [10, 20]])
>>> print(L[0:5])
tensor([[1, 0, 0, 0, 1],
        [1, 0, 0, 1, 0],
        [0, 1, 0, 0, 1],
        [0, 0, 1, 0, 1],
        [1, 0, 0, 1, 0]])
>>> print(H[0:5])
tensor([[10, 21],
        [10, 20],
        [11, 21],
        [12, 21],
        [10, 20]])
clone() LabelBinariser[source]#

Obtain a clone of this class.

Returns:
binarisermvpy.estimators.LabelBinariser

The clone.

copy() LabelBinariser[source]#

Obtain a copy of this class.

Returns:
binarisermvpy.estimators.LabelBinariser

The copy.

fit(y: ndarray | Tensor, *args: Any) BaseEstimator[source]#

Fit the binariser.

Parameters:
ynp.ndarray | torch.Tensor

The data of shape (n_samples[, n_features]).

argsAny

Additional arguments.

Returns:
binarisersklearn.base.BaseEstimator

The fitted binariser.

fit_transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Fit and transform the data in one step.

Parameters:
ynp.ndarray | torch.Tensor

The data of shape (n_samples[, n_features]).

argsAny

Additional arguments.

Returns:
Lnp.ndarray | torch.Tensor

The binarised data of shape (n_samples, [n_features * ]n_classes).

inverse_transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Obtain labels from transformed data.

Parameters:
Lnp.ndarray | torch.Tensor

The binarised data of shape (n_samples, [n_features * ]n_classes).

argsAny

Additional arguments.

Returns:
ynp.ndarray | torch.Tensor

The labels of shape (n_samples, n_features).

to_numpy()[source]#

Select the numpy binariser. Note that this cannot be called for conversion.

Returns:
binarisermvpy.estimators._LabelBinariser_numpy

The numpy binariser.

to_torch()[source]#

Select the torch binariser. Note that this cannot be called for conversion.

Returns:
binarisermvpy.estimators._LabelBinariser_torch

The torch binariser.

transform(y: ndarray | Tensor, *args: Any) ndarray | Tensor[source]#

Transform the data based on fitted binariser.

Parameters:
ynp.ndarray | torch.Tensor

The data of shape (n_samples[, n_features]).

argsAny

Additional arguments.

Returns:
Lnp.ndarray | torch.Tensor

The binarised data of shape (n_samples, [n_features * ]n_classes).