SVC#
- class mvpy.estimators.SVC(method: str = 'OvR', C: float = 1.0, kernel: str = 'linear', gamma: str | float = 'scale', coef0: float = 0.0, degree: float = 3.0, tol: float = 0.001, lr: float = 0.001, max_iter: int = 1000)[source]#
Implements a support vector classifier.
Support vector classifiers frame a classification problem mapping from neural data \(X\) to labels \(y\in\{1, -1\}\) as a max-margin problem:
\[f(X) = w^T\varphi(X) + b\]that separates the classes with the largest possible margin in feature space \(\varphi(\cdot)\). As in
KernelRidgeClassifier, \(\varphi(X)\) is a gram matrix defined by some kernel function. Contrary toKernelRidgeClassifier, however,SVCminimises a hinge-loss surrogate:\[\arg\min_{w, b} \frac{1}{2}\lvert\lvert w\rvert\rvert^2 + C\sum_i\max\left(0, 1 - y_i f(X_i)\right)\]Via the kernel trick, the decision function can be written in dual form as:
\[f(X) = \sum_{i\in\mathcal{S}} \alpha_i y_i \kappa(X_i, X) + b\]where \(\alpha_i\ge 0\), and \(\kappa\) is a positive-definite kernel. Hyperparameters like the penalisation \(C\) are typically selected by cross-validation. Unlike
KernelRidgeClassifier, penalty selection cannot be conveniently automated through LOO-CV here.Compared to
RidgeClassifierorKernelRidgeClassifier,SVCoptimises a margin-based objective and often yields tighter decision boundaries, particularly when classes are not well separated linearly or when using non-linear kernel–at the cost of higher training time.For more information on support vector classifiers, see [1].
Warning
SVCis currently considered experimental. As is, it uses gradient ascent over vectorised features and stops early when \(\Delta\lvert\lvert grad\rvert\rvert\) is smaller than some tolerance. This diverges from sklearn’s behaviour and may produce slightly degraded decision boundaries. In the future, we will be switching to an SMO routine that should resolve these issues.- Parameters:
- method{‘OvR’, ‘OvO’}, default=’OvR’
For multiclass problems, which method should we use? One-versus-one (OvO) or one-versus-rest (OvR)?
- Cfloat, default=1.0
Regularisation strength is inversely related to C.
- kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’
Which kernel function should we use (linear, poly, rbf, sigmoid)?
- gamma{‘scale’, ‘auto’, float}, default=’scale’
What gamma to use for poly, rbf and sigmoid. Available methods are scale or auto, or positive float.
- coef0float, default=0.0
What offset to use for poly and sigmoid.
- degreefloat, default=3.0
What degree polynomial to use (if any).
- tolfloat, default=1e-3
Tolerance over maximum update step (i.e., when maximal gradient < tol, early stopping is triggered).
- lrfloat, default=1e-3
The learning rate.
- max_iterint, default=1000
The maximum number of iterations to perform while fitting, or -1 to disable.
- Attributes:
- method{‘OvR’, ‘OvO’}, default=’OvR’
For multiclass problems, which method should we use? One-versus-one (OvO) or one-versus-rest (OvR)?
- Cfloat, default=1.0
Regularisation strength is inversely related to C.
- kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’
Which kernel function should we use (linear, poly, rbf, sigmoid)?
- gamma{‘scale’, ‘auto’, float}, default=’scale’
What gamma to use for poly, rbf and sigmoid. Available methods are scale or auto, or positive float.
- coef0float, default=0.0
What offset to use for poly and sigmoid.
- degreefloat, default=3.0
What degree polynomial to use (if any).
- tolfloat, default=1e-3
Tolerance over maximum update step (i.e., when maximal gradient < tol, early stopping is triggered).
- lrfloat, default=1e-3
The learning rate.
- max_iterint, default=1000
The maximum number of iterations to perform while fitting, or -1 to disable.
- X_train_np.ndarray | torch.Tensor
A clone of the training data used internally for kernel estimation.
- A_np.ndarray | torch.Tensor
A clone of the alpha data used internally for kernel estimation.
- gamma_float
Estimated gamma parameter.
- eps_float, default=1e-12
Error margin for support vectors used internally.
- w_np.ndarray | torch.Tensor
If linear kernel, estimated weights.
- p_np.ndarray | torch.Tensor
If linear kernel, estimated patterns.
- intercept_np.ndarray | torch.Tensor
The intercept vector.
- coef_np.ndarray | torch.Tensor
If
kernelislinear, the coefficients of the model.- pattern_np.ndarray | torch.Tensor
If
kernelislinear, the patterns used by the model.- binariser_mvpy.preprocessing.LabelBinariser
The binariser used internally.
- scaler_mvpy.preprocessing.Scaler
The scaler used internally.
- metric_mvpy.metrics.accuracy
The default metric to use.
See also
mvpy.math.kernel_linear,mvpy.math.kernel_poly,mvpy.math.kernel_rbf,mvpy.math.kernel_sigmoidAvailable kernel functions.
Notes
Coefficients are interpretable only when
kernelislinear. In this case, patterns are computed as per [2].References
[1]Awad, M., & Khanna, R. (2015). Support vector machines for classification. Efficient Learning Machines, 39-66. 10.1007/F978-1-4302-5990-9_3
[2]Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.D., Blankertz, B., & Bießmann, F. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage, 87, 96-110. 10.1016/j.neuroimage.2013.10.067
Examples
First, let’s look at a case where we have one feature that has two classes.
>>> import torch >>> from mvpy.estimators import SVC >>> from sklearn.datasets import make_circles >>> X, y = make_circles(noise = 0.3) >>> X, y = torch.from_numpy(X).float(), torch.from_numpy(y).float() >>> clf = SVC(kernel = 'rbf').fit(X, y) >>> y_h = clf.predict(X) >>> mv.math.accuracy(y_h.squeeze(), y) tensor(0.6700)
Second, let’s look at a case where we have one feature that has three classes.
>>> import torch >>> from mvpy.estimators import SVC >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y = True) >>> X, y = torch.from_numpy(X).float(), torch.from_numpy(y).float() >>> clf = SVC(kernel = 'rbf').fit(X, y) >>> y_h = clf.predict(X) >>> mv.math.accuracy(y_h.squeeze(), y) tensor(0.9733)
Third, let’s look at a case where we have two features with a variable number of classes.
>>> import torch >>> from mvpy.estimators import SVC >>> from sklearn.datasets import make_classification >>> X0, y0 = make_classification(n_classes = 3, n_informative = 6) >>> X1, y1 = make_classification(n_classes = 4, n_informative = 8) >>> X = torch.from_numpy(np.concatenate((X0, X1), axis = -1)).float() >>> y = torch.from_numpy(np.stack((y0, y1), axis = -1)).float() >>> clf = SVC(kernel = 'rbf').fit(X, y) >>> y_h = clf.predict(X) >>> mv.math.accuracy(y_h.T, y.T) tensor([1.000, 0.9800])
- decision_function(X: ndarray | Tensor) ndarray | Tensor[source]#
Predict from the estimator.
- Parameters:
- Xnp.ndarray | torch.Tensor
The features of shape
(n_samples, n_channels).
- Returns:
- dfnp.ndarray | torch.Tensor
The predictions of shape
(n_samples, n_classes).
- fit(X: ndarray | Tensor, y: ndarray | Tensor) BaseEstimator[source]#
Fit the estimator.
- Parameters:
- Xnp.ndarray | torch.Tensor
The features of shape
(n_samples, n_channels).- ynp.ndarray | torch.Tensor
The targets of shape
(n_samples[, n_features]).
- Returns:
- clfmvpy.estimators.SVC
The classifier.
- predict(X: ndarray | Tensor) ndarray | Tensor[source]#
Predict from the estimator.
- Parameters:
- Xnp.ndarray | torch.Tensor
The features of shape
(n_samples, n_channels).
- Returns:
- y_hnp.ndarray | torch.Tensor
The predictions of shape
(n_samples, n_features).
- predict_proba(X: ndarray | Tensor) ndarray | Tensor[source]#
Predict from the estimator.
- Parameters:
- Xnp.ndarray | torch.Tensor
The features
(n_samples, n_channels).
- Returns:
- pnp.ndarray | torch.Tensor
The predictions of shape
(n_samples, n_classes).
Warning
Probabilities are computed from
expit()over outputs ofdecision_function(). Consequently, probability estimates returned by this class are not calibrated. SeeClassifierfor more information.
- score(X: ndarray | Tensor, y: ndarray | Tensor, metric: Metric | Tuple[Metric] | None = None) ndarray | Tensor | Dict[str, ndarray] | Dict[str, Tensor][source]#
Make predictions from \(X\) and score against \(y\).
- Parameters:
- Xnp.ndarray | torch.Tensor
Input data of shape
(n_samples, n_channels).- ynp.ndarray | torch.Tensor
Output data of shape
(n_samples, n_features).- metricOptional[Metric | Tuple[Metric]], default=None
Metric or tuple of metrics to compute. If
None, defaults tometric_.
- Returns:
- scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray], Dict[str, torch.Tensor]
Scores of shape
(n_features,)or, for multiple metrics, a dictionary of metric names and scores of shape(n_features,).
Warning
If multiple values are supplied for
metric, this function will output a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.