KernelRidgeCV#
- class mvpy.estimators.KernelRidgeCV(alphas: ndarray | Tensor | list | float | int = 1, kernel: str = 'linear', gamma: float | str = 'auto', coef0: float = 1.0, degree: float = 3.0, alpha_per_target: bool = False)[source]#
Implements a kernel ridge regression with cross-validation.
Kernel ridge regression maps input data \(X\) to output data \(y\) through coefficients \(\beta\):
\[y = \beta\kappa + \varepsilon\]where \(\kappa\) is some gram matrix of \(X\) and solves for the model \(\beta\) through:
\[\arg\min_\beta \frac{1}{2}\lvert\lvert y - \kappa\beta\rvert\rvert_2^2 + \frac{\alpha_\beta}{2}\lvert\lvert\beta\rvert\rvert_\kappa^2\]where \(\alpha_\beta\) are penalties to test in LOO-CV which has a convenient closed-form solution here:
\[\arg\min_{\alpha_\beta} \frac{1}{N} \sum_{i = 1}^{N} \left(\frac{y - \kappa\beta_\alpha}{1 - H_{\alpha,ii}}\right) \qquad\textrm{where}\qquad H_{\alpha,ii} = \textrm{diag}\left(\kappa\cdot\left(\kappa + \alpha_\beta I\right)^{-1}\right)\]In other words, this solves a ridge regression in the parameter space defined by the kernel function \(\kappa(X, X)\). This is convenient because, just like
SVC, it allows for non-parametric estimation. For example,kernelrbfmay capture non-linearities in data thatRidgeCVcannot account for. The closed-form LOO-CV formula is evaluated at all values ofalphasand the penalty minimising the mean-squared loss is automatically chosen. This is convenient because it is faster than performing inner cross-validation to fine-tune penalties.As such,
KernelRidgeCVmirrorsSVCin its application of the kernel trick and the associated benefits. The key difference here is thatKernelRidgeCVis fit using L2 regularised squared error, whereasSVCis fit through sequential minimal optimisation or gradient ascent over hinge losses. In practice, this means thatKernelRidgeCVis much faster–particularly when multiple values ofalphasare specified–but produces less sparse solutions that are not margin-based.For more information on kernel ridge regression, see [1] [2].
- Parameters:
- alphasnp.ndarray | torch.tensor | List | float | int, default=1.0
Alpha penalties to test.
- kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’
Kernel function to use.
- gamma{float, ‘auto’, ‘scale’}, default=’auto’
Gamma to use in kernel computation.
- coef0float, default=1.0
Coefficient zero to use in kernel computation.
- degreefloat, default=3.0
Degree of kernel to use.
- alpha_per_targetbool, default=False
Should we fit one alpha per target?
- Attributes:
- alphasnp.ndarray | torch.tensor | List | float | int, default=1.0
Alpha penalties to test.
- kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}, default=’linear’
Kernel function to use.
- gamma{float, ‘auto’, ‘scale’}, default=’auto’
Gamma to use in kernel computation.
- coef0float, default=1.0
Coefficient zero to use in kernel computation.
- degreefloat, default=3.0
Degree of kernel to use.
- alpha_per_targetbool, default=False
Should we fit one alpha per target?
- X_np.ndarray | torch.Tensor
Training data X of shape
(n_samples, n_channels).- A_dual_np.ndarray | torch.Tensor
Chosen dual alpha of shape
(n_samples, n_features).- alpha_float | np.ndarray | torch.Tensor
Chosen alpha penalties.
- coef_Optional[np.ndarray | torch.Tensor]
If
kernelislinear, coefficients of shape(n_channels, n_features).- metric_mvpy.metrics.r2
The default metric to use.
See also
mvpy.estimators.RidgeCVAlternative ridge regression without kernel functions.
mvpy.math.kernel_linear,mvpy.math.kernel_poly,mvpy.math.kernel_rbf,mvpy.math.kernel_sigmoidAvailable kernel functions.
Notes
Coefficients
coef_are available only whenkernelislinearwhere primal weights can be computed from dual solutions:\[w = X^T\beta\]For other kernel functions, coefficients are not interpretable and, therefore, not computed here.
Warning
For small values of
alphas, kernel matrices may no longer be positive semidefinite. This means that, in many cases, model fitting may have to resort to least squares solutions, which can decrease through-put by an order of magnitude (or more). This issue is particularly prevalent in the numpy backend. Please consider this when choosing penalties.Warning
This issue can also appear independently of
alphas. For example, the gram matrix given \(X\sim\mathcal{N}(0, 1)\) will already be rank-deficient if \(n\_samples\geq n\_channels\). As is the case insklearn, this will lead to poor solving speed in the numpy backend. The torch backend is more robust to this. Please consider this when investigating your data prior to model fitting.References
[1]Murphy, K.P. (2012). Machine learning: A probabilistic perspective. MIT Press.
[2]Nadaraya, E.A. (1964). On estimating regression. Theory of Probability and Its Applications, 9, 141-142. 10.1137/1109020
Examples
>>> import torch >>> from mvpy.estimators import KernelRidgeCV >>> ß = torch.normal(0, 1, size = (5,)) >>> X = torch.normal(0, 1, size = (240, 5)) >>> y = ß @ X.T + torch.normal(0, 0.5, size = (X.shape[0],)) >>> model = KernelRidgeCV().fit(X, y) >>> model.coef_
- clone() KernelRidgeCV[source]#
Make a clone of this class.
- Returns:
- estimatorKernelRidgeCV
A clone of this class.
- fit(X: ndarray | Tensor, y: ndarray | Tensor) KernelRidgeCV[source]#
Fit the estimator.
- Parameters:
- Xnp.ndarray | torch.Tensor
Input data of shape
(n_samples, n_channels).- ynp.ndarray | torch.Tensor
Input features of shape
(n_samples, n_features).
- Returns:
- estimatorKernelRidgeCV
The fitted estimator.
- predict(X: ndarray | Tensor) ndarray | Tensor[source]#
Make predictions from the estimator.
- Parameters:
- Xnp.ndarray | torch.Tensor
Input data of shape
(n_samples, n_channels).
- Returns:
- y_hnp.ndarray | torch.Tensor
Predicted output features of shape
(n_samples, n_features).
- score(X: ndarray | Tensor, y: ndarray | Tensor, metric: Metric | Tuple[Metric] | None = None) ndarray | Tensor | Dict[str, ndarray] | Dict[str, Tensor][source]#
Make predictions from \(X\) and score against \(y\).
- Parameters:
- Xtorch.Tensor
Input data of shape
(n_samples, n_channels).- ytorch.Tensor
Output data of shape
(n_samples, n_features).- metricOptional[Metric | Tuple[Metric]], default=None
Metric or tuple of metrics to compute. If
None, defaults tometric_.
- Returns:
- scorenp.ndarray | torch.Tensor | Dict[str, np.ndarray] | Dict[str, torch.Tensor]
Scores of shape
(n_features,)or, for multiple metrics, a dictionary of metric names and scores of shape(n_features,).
Warning
If multiple values are supplied for
metric, this function will output a dictionary of{Metric.name: score, ...}rather than a stacked array. This is to provide consistency across cases where metrics may or may not differ in their output shapes.