SVC#

class mvpy.estimators.SVC(method: str = 'OvR', C: float = 1.0, kernel: str = 'linear', gamma: str | float = 'scale', coef0: float = 0.0, degree: float = 3.0, tol: float = 0.001, lr: float = 0.001, max_iter: int = 1000)[source]#

Implements a support vector classifier.

Support vector classifiers frame a classification problem mapping from neural data \(X\) to labels \(y\in\{1, -1\}\) as a max-margin problem:

\[f(X) = w^T\varphi(X) + b\]

that separates the classes with the largest possible margin in feature space \(\varphi(\cdot)\). As in KernelRidgeClassifier, \(\varphi(X)\) is a gram matrix defined by some kernel function. Contrary to KernelRidgeClassifier, however, SVC minimises a hinge-loss surrogate:

\[\arg\min_{w, b} \frac{1}{2}\lvert\lvert w\rvert\rvert^2 + C\sum_i\max\left(0, 1 - y_i f(X_i)\right)\]

Via the kernel trick, the decision function can be written in dual form as:

\[f(X) = \sum_{i\in\mathcal{S}} \alpha_i y_i \kappa(X_i, X) + b\]

where \(\alpha_i\ge 0\), and \(\kappa\) is a positive-definite kernel. Hyperparameters like the penalisation \(C\) are typically selected by cross-validation. Unlike KernelRidgeClassifier, penalty selection cannot be conveniently automated through LOO-CV here.

Compared to RidgeClassifier or KernelRidgeClassifier, SVC optimises a margin-based objective and often yields tighter decision boundaries, particularly when classes are not well separated linearly or when using non-linear kernel–at the cost of higher training time.

For more information on support vector classifiers, see [1].

Warning

SVC is currently considered experimental. As is, it uses gradient ascent over vectorised features and stops early when \(\Delta\lvert\lvert grad\rvert\rvert\) is smaller than some tolerance. This diverges from sklearn’s behaviour and may produce slightly degraded decision boundaries. In the future, we will be switching to an SMO routine that should resolve these issues.

Parameters:
method{‘OvR’, ‘OvO’}, default=’OvR’

For multiclass problems, which method should we use? One-versus-one (OvO) or one-versus-rest (OvR)?

Cfloat, default=1.0

Regularisation strength is inversely related to C.

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’

Which kernel function should we use (linear, poly, rbf, sigmoid)?

gamma{‘scale’, ‘auto’, float}, default=’scale’

What gamma to use for poly, rbf and sigmoid. Available methods are scale or auto, or positive float.

coef0float, default=0.0

What offset to use for poly and sigmoid.

degreefloat, default=3.0

What degree polynomial to use (if any).

tolfloat, default=1e-3

Tolerance over maximum update step (i.e., when maximal gradient < tol, early stopping is triggered).

lrfloat, default=1e-3

The learning rate.

max_iterint, default=1000

The maximum number of iterations to perform while fitting, or -1 to disable.

Attributes:
method{‘OvR’, ‘OvO’}, default=’OvR’

For multiclass problems, which method should we use? One-versus-one (OvO) or one-versus-rest (OvR)?

Cfloat, default=1.0

Regularisation strength is inversely related to C.

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’

Which kernel function should we use (linear, poly, rbf, sigmoid)?

gamma{‘scale’, ‘auto’, float}, default=’scale’

What gamma to use for poly, rbf and sigmoid. Available methods are scale or auto, or positive float.

coef0float, default=0.0

What offset to use for poly and sigmoid.

degreefloat, default=3.0

What degree polynomial to use (if any).

tolfloat, default=1e-3

Tolerance over maximum update step (i.e., when maximal gradient < tol, early stopping is triggered).

lrfloat, default=1e-3

The learning rate.

max_iterint, default=1000

The maximum number of iterations to perform while fitting, or -1 to disable.

X_train_np.ndarray | torch.Tensor

A clone of the training data used internally for kernel estimation.

A_np.ndarray | torch.Tensor

A clone of the alpha data used internally for kernel estimation.

gamma_float

Estimated gamma parameter.

eps_float, default=1e-12

Error margin for support vectors used internally.

w_np.ndarray | torch.Tensor

If linear kernel, estimated weights.

p_np.ndarray | torch.Tensor

If linear kernel, estimated patterns.

intercept_np.ndarray | torch.Tensor

The intercept vector.

coef_np.ndarray | torch.Tensor

If kernel is linear, the coefficients of the model.

pattern_np.ndarray | torch.Tensor

If kernel is linear, the patterns used by the model.

binariser_mvpy.preprocessing.LabelBinariser

The binariser used internally.

scaler_mvpy.preprocessing.Scaler

The scaler used internally.

metric_mvpy.metrics.accuracy

The default metric to use.

Notes

Coefficients are interpretable only when kernel is linear. In this case, patterns are computed as per [2].

References

[1]

Awad, M., & Khanna, R. (2015). Support vector machines for classification. Efficient Learning Machines, 39-66. 10.1007/F978-1-4302-5990-9_3

[2]

Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.D., Blankertz, B., & Bießmann, F. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage, 87, 96-110. 10.1016/j.neuroimage.2013.10.067

Examples

First, let’s look at a case where we have one feature that has two classes.

>>> import torch
>>> from mvpy.estimators import SVC
>>> from sklearn.datasets import make_circles
>>> X, y = make_circles(noise = 0.3)
>>> X, y = torch.from_numpy(X).float(), torch.from_numpy(y).float()
>>> clf = SVC(kernel = 'rbf').fit(X, y)
>>> y_h = clf.predict(X)
>>> mv.math.accuracy(y_h.squeeze(), y)
tensor(0.6700)

Second, let’s look at a case where we have one feature that has three classes.

>>> import torch
>>> from mvpy.estimators import SVC
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y = True)
>>> X, y = torch.from_numpy(X).float(), torch.from_numpy(y).float()
>>> clf = SVC(kernel = 'rbf').fit(X, y)
>>> y_h = clf.predict(X)
>>> mv.math.accuracy(y_h.squeeze(), y)
tensor(0.9733)

Third, let’s look at a case where we have two features with a variable number of classes.

>>> import torch
>>> from mvpy.estimators import SVC
>>> from sklearn.datasets import make_classification
>>> X0, y0 = make_classification(n_classes = 3, n_informative = 6)
>>> X1, y1 = make_classification(n_classes = 4, n_informative = 8)
>>> X = torch.from_numpy(np.concatenate((X0, X1), axis = -1)).float()
>>> y = torch.from_numpy(np.stack((y0, y1), axis = -1)).float()
>>> clf = SVC(kernel = 'rbf').fit(X, y)
>>> y_h = clf.predict(X)
>>> mv.math.accuracy(y_h.T, y.T)
tensor([1.000, 0.9800])
clone() SVC[source]#

Clone this class.

Returns:
svcmvpy.estimators.SVC

The cloned object.

copy() SVC[source]#

Clone this class.

Returns:
svcmvpy.estimators.SVC

The cloned object.

decision_function(X: ndarray | Tensor) ndarray | Tensor[source]#

Predict from the estimator.

Parameters:
Xnp.ndarray | torch.Tensor

The features of shape (n_samples, n_channels).

Returns:
dfnp.ndarray | torch.Tensor

The predictions of shape (n_samples, n_classes).

fit(X: ndarray | Tensor, y: ndarray | Tensor) BaseEstimator[source]#

Fit the estimator.

Parameters:
Xnp.ndarray | torch.Tensor

The features of shape (n_samples, n_channels).

ynp.ndarray | torch.Tensor

The targets of shape (n_samples[, n_features]).

Returns:
clfmvpy.estimators.SVC

The classifier.

predict(X: ndarray | Tensor) ndarray | Tensor[source]#

Predict from the estimator.

Parameters:
Xnp.ndarray | torch.Tensor

The features of shape (n_samples, n_channels).

Returns:
y_hnp.ndarray | torch.Tensor

The predictions of shape (n_samples, n_features).

predict_proba(X: ndarray | Tensor) ndarray | Tensor[source]#

Predict from the estimator.

Parameters:
Xnp.ndarray | torch.Tensor

The features (n_samples, n_channels).

Returns:
pnp.ndarray | torch.Tensor

The predictions of shape (n_samples, n_classes).

Warning

Probabilities are computed from expit() over outputs of decision_function(). Consequently, probability estimates returned by this class are not calibrated. See Classifier for more information.

score(X: ndarray | Tensor, y: ndarray | Tensor, metric: Metric | Tuple[Metric] | None = None) ndarray | Tensor | Dict[str, ndarray] | Dict[str, Tensor][source]#

Make predictions from \(X\) and score against \(y\).

Parameters:
Xnp.ndarray | torch.Tensor

Input data of shape (n_samples, n_channels).

ynp.ndarray | torch.Tensor

Output data of shape (n_samples, n_features).

metricOptional[Metric | Tuple[Metric]], default=None

Metric or tuple of metrics to compute. If None, defaults to metric_.

Returns:
scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray], Dict[str, torch.Tensor]

Scores of shape (n_features,) or, for multiple metrics, a dictionary of metric names and scores of shape (n_features,).

Warning

If multiple values are supplied for metric, this function will output a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

to_numpy() BaseEstimator[source]#

Obtain the estimator with numpy as backend.

Returns:
svcmvpy.estimators._SVC_numpy

The estimator.

to_torch() BaseEstimator[source]#

Obtain the estimator with torch as backend.

Returns:
svcmvpy.estimators._SVC_torch

The estimator.