KernelRidgeCV#

Implements a kernel ridge regression with cross-validation.

Kernel ridge regression maps input data \(X\) to output data \(y\) through coefficients \(\beta\):

\[y = \beta\kappa + \varepsilon\]

where \(\kappa\) is some gram matrix of \(X\) and solves for the model \(\beta\) through:

\[\arg\min_\beta \frac{1}{2}\lvert\lvert y - \kappa\beta\rvert\rvert_2^2 + \frac{\alpha_\beta}{2}\lvert\lvert\beta\rvert\rvert_\kappa^2\]

where \(\alpha_\beta\) are penalties to test in LOO-CV which has a convenient closed-form solution here:

\[\arg\min_{\alpha_\beta} \frac{1}{N} \sum_{i = 1}^{N} \left(\frac{y - \kappa\beta_\alpha}{1 - H_{\alpha,ii}}\right) \qquad\textrm{where}\qquad H_{\alpha,ii} = \textrm{diag}\left(\kappa\cdot\left(\kappa + \alpha_\beta I\right)^{-1}\right)\]

In other words, this solves a ridge regression in the parameter space defined by the kernel function \(\kappa(X, X)\). This is convenient because, just like SVC, it allows for non-parametric estimation. For example, kernel rbf may capture non-linearities in data that RidgeCV cannot account for. The closed-form LOO-CV formula is evaluated at all values of alphas and the penalty minimising the mean-squared loss is automatically chosen. This is convenient because it is faster than performing inner cross-validation to fine-tune penalties.

As such, KernelRidgeCV mirrors SVC in its application of the kernel trick and the associated benefits. The key difference here is that KernelRidgeCV is fit using L2 regularised squared error, whereas SVC is fit through sequential minimal optimisation or gradient ascent over hinge losses. In practice, this means that KernelRidgeCV is much faster–particularly when multiple values of alphas are specified–but produces less sparse solutions that are not margin-based.

For more information on kernel ridge regression, see [1] [2].

Parameters:

alphasnp.ndarray | torch.tensor | List | float | int, default=1.0: Alpha penalties to test.
kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’: Kernel function to use.
gamma{float, ‘auto’, ‘scale’}, default=’auto’: Gamma to use in kernel computation.
coef0float, default=1.0: Coefficient zero to use in kernel computation.
degreefloat, default=3.0: Degree of kernel to use.
alpha_per_targetbool, default=False: Should we fit one alpha per target?

Attributes:

alphasnp.ndarray | torch.tensor | List | float | int, default=1.0: Alpha penalties to test.
kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’: Kernel function to use.
gamma{float, ‘auto’, ‘scale’}, default=’auto’: Gamma to use in kernel computation.
coef0float, default=1.0: Coefficient zero to use in kernel computation.
degreefloat, default=3.0: Degree of kernel to use.
alpha_per_targetbool, default=False: Should we fit one alpha per target?
X_np.ndarray | torch.Tensor: Training data X of shape (n_samples, n_channels).
A_dual_np.ndarray | torch.Tensor: Chosen dual alpha of shape (n_samples, n_features).
alpha_float | np.ndarray | torch.Tensor: Chosen alpha penalties.
coef_Optional[np.ndarray | torch.Tensor]: If kernel is linear, coefficients of shape (n_channels, n_features).
metric_mvpy.metrics.r2: The default metric to use.

See also

mvpy.estimators.RidgeCV: Alternative ridge regression without kernel functions.
mvpy.math.kernel_linear, mvpy.math.kernel_poly, mvpy.math.kernel_rbf, mvpy.math.kernel_sigmoid: Available kernel functions.

Notes

Coefficients coef_ are available only when kernel is linear where primal weights can be computed from dual solutions:

\[w = X^T\beta\]

For other kernel functions, coefficients are not interpretable and, therefore, not computed here.

Warning

For small values of alphas, kernel matrices may no longer be positive semidefinite. This means that, in many cases, model fitting may have to resort to least squares solutions, which can decrease through-put by an order of magnitude (or more). This issue is particularly prevalent in the numpy backend. Please consider this when choosing penalties.

Warning

This issue can also appear independently of alphas. For example, the gram matrix given \(X\sim\mathcal{N}(0, 1)\) will already be rank-deficient if \(n\_samples\geq n\_channels\). As is the case in sklearn, this will lead to poor solving speed in the numpy backend. The torch backend is more robust to this. Please consider this when investigating your data prior to model fitting.

References

[1]

Murphy, K.P. (2012). Machine learning: A probabilistic perspective. MIT Press.

[2]

Nadaraya, E.A. (1964). On estimating regression. Theory of Probability and Its Applications, 9, 141-142. 10.1137/1109020

Examples

>>> import torch
>>> from mvpy.estimators import KernelRidgeCV
>>> ß = torch.normal(0, 1, size = (5,))
>>> X = torch.normal(0, 1, size = (240, 5))
>>> y = ß @ X.T + torch.normal(0, 0.5, size = (X.shape[0],))
>>> model = KernelRidgeCV().fit(X, y)
>>> model.coef_

clone() → KernelRidgeCV[source]#

Make a clone of this class.

Returns:

estimatorKernelRidgeCV: A clone of this class.

fit(X: ndarray | Tensor, y: ndarray | Tensor) → KernelRidgeCV[source]#

Fit the estimator.

Parameters:

Xnp.ndarray | torch.Tensor: Input data of shape (n_samples, n_channels).
ynp.ndarray | torch.Tensor: Input features of shape (n_samples, n_features).

Returns:

estimatorKernelRidgeCV: The fitted estimator.

predict(X: ndarray | Tensor) → ndarray | Tensor[source]#

Make predictions from the estimator.

Parameters:

Xnp.ndarray | torch.Tensor: Input data of shape (n_samples, n_channels).

Returns:

y_hnp.ndarray | torch.Tensor: Predicted output features of shape (n_samples, n_features).

Make predictions from \(X\) and score against \(y\).

Parameters:

Xtorch.Tensor: Input data of shape (n_samples, n_channels).
ytorch.Tensor: Output data of shape (n_samples, n_features).
metricOptional[Metric | Tuple[Metric]], default=None: Metric or tuple of metrics to compute. If None, defaults to metric_.

Returns:

scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]: Scores of shape (n_features,) or, for multiple metrics, a dictionary of metric names and scores of shape (n_features,).

Warning

If multiple values are supplied for metric, this function will output a dictionary of {Metric.name: score, ...} rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.

KernelRidgeCV#

This Page