In machine learning, feature engineering is an important step that determines the level of importance of any features from the data. In one of our articles, we have seen that ridge regression is used to get rid of overfitting which can also be reduced by fitting the model with only important features. Ridge regression can also help us in feature selection to find out the important features required for modelling purposes. In this article, we are going to discuss ridge regression for feature selection. The major points to be discussed in the article are listed below.
Table of contents
- What is ridge regression?
- Why ridge regression for feature selection?
- Implementing Ridge regression for feature selection
Let’s begin with understanding the ridge regression.
What is ridge regression?
We can consider ridge regression as a way or method to estimate the coefficient of multiple regression models. We mainly find the requirement of ridge regression where variables in data are highly correlated. We can also think of ridge regression as a possible solution to the imprecision of LSE(least square estimator) where the independent variables of any linear regression model are highly correlated.
It converts LSE into a ridge regression estimator. Using this we can get a more precise estimate because it provides a smaller variance and the mean square estimator. This regression is also called an L2 regularization that uses shrinkage of data values. Let’s see why we can use ridge regression for feature selection.
Are you looking for a complete repository of Python libraries used in data science, check out here.
Why ridge regression for feature selection?
One of the most important things about ridge regression is that without wasting any information about predictions it tries to determine variables that have exactly zero effects. Ridge regression is popular because it uses regularization for making predictions and regularization is intended to resolve the problem of overfitting. We mainly find that overfitting is where the size of data is very large and ridge regression works by penalizing the coefficient of features and it also minimizes the errors in prediction.
Talking about the other methods of feature importance they directly nullify the effect of less competent features. If we believe that some of the coefficients have zero effect using ridge regression we can build a better model by providing Bayesian priors for regression coefficients. For example, if we know that any of the coefficients have zero effect but which one has it we don’t know then we can use a prior with a ridge to know about the effect of every coefficient. Let’s see how we can implement ridge for feature selection.
Implementing Ridge regression for feature selection
Importing libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression, Ridge
Making data set
X, y = make_classification(n_samples=3000, n_features=10, n_informative=5, random_state=1)
X.shape, y.shape
Output:
In the above, we have made a classification data that has 10 features in it and 3000 values.
Plotting some data
plt.scatter(X[:, 0], X[:, 1], marker="o", c=y, s=25, edgecolor="k")
Output:
Here we can see the distribution of the data of the first and second variables. Let’s just split the data.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=0)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
Output:
Scaling the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train, y_train)
We can use ridge regression for feature selection while fitting the model. In this article, we are going to use logistic regression for model fitting and push the parameter penalty as L2 which basically means the penalty we use in ridge regression.
ridge_logit =LogisticRegression(C=1, penalty='l2')
ridge_logit.fit(X_train, y_train)
Output:
Let’s check the coefficient of features that will tell us how the features are important for the model.
ridge_logit.coef_
Output:
In the above, we can see that the feature from the data is one of the most important features and other features are not that much important.
Let’s check how many features are having a coefficient of more than zero
np.sum(ridge_logit.coef_ >= 0)
Output:
Let’s draw the plot for feature importance.
importance = ridge_logit.coef_[0]
plt.bar([x for x in range(len(importance))], importance)
plt.show()
Output:
Here we can see how the features are correlated to the model predictions.
Here in the above, we have seen how the ridge regression works for feature selection now after looking at the plots and results we can select the features from data to make prediction more accurate. As discussed above we can also see that it has considered all the features in the modelling of the logistic regression model which means it has the benefits of not dropping any of the features from the data set and in the results it gives a level of importance of every feature.
Final words
In this article, we have discussed ridge regression which is basically a feature regularization technique using which we can also get the levels of importance of the features. Not nullifying the effect of any feature from the data makes the ridge regression different from the other methods.
References