

Polynomial regression is a form of regression analysis in which the relationship between the independent variable and the dependent variable
is modeled as an
-degree polynomial. It’s an extension of linear regression that can capture non-linear relationships by adding polynomial terms to the model.
Key Concepts
- Polynomial Terms:
In polynomial regression, we expand the features by including powers of. For a degree
polynomial regression, the model can be expressed as:
whereare the model coefficients, and
is the error term.
- Non-linearity:
Although polynomial regression can fit non-linear data, it’s still a linear model in terms of the parameters (coefficients). This is because we’re only transforming the input features (raising them to a power) and still fitting a linear model to these transformed features. - Feature Expansion:
To fit a polynomial regression, we expand the input features into polynomial features. For instance, with a degree-3 polynomial regression, we might create new features such as, and
and use these in the model.
- Overfitting and Underfitting:
- Underfitting happens when the polynomial degree is too low to capture the underlying trend in the data.
- Overfitting happens when the polynomial degree is too high, causing the model to fit noise in the data, reducing its ability to generalize to new data.
Steps to Perform Polynomial Regression
- Choose the Degree: Decide on the degree of the polynomial based on domain knowledge, exploratory data analysis, or experimentation.
- Transform Features: Generate polynomial features up to the chosen degree (e.g.,
, etc.).
- Fit the Model: Perform regression (often linear regression) on these polynomial features to find the coefficients that best fit the data.
- Evaluate: Use metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or (
) score to assess how well the model fits the data.
Example of Polynomial Regression in Python (scroll down to see R codes)
Using scikit-learn, here’s how to perform polynomial regression:
We start by importing the necessary libraries from scikit-learn and numpy.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
Next, we define the sample data, where x
represents the input features and y
represents the target variable.
# Sample data
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1.2, 1.9, 3.1, 4.5, 6.8])
Define Polynomial Degree: We set the degree of the polynomial features to 2. This means we will be fitting a quadratic polynomial to the data.
# Define polynomial degree
degree = 2
Now, let’s create a pipeline that first generates polynomial features of the specified degree and then fits a linear regression model to these features.
# Create a pipeline to perform polynomial regression
model = Pipeline([
("poly_features", PolynomialFeatures(degree=degree, include_bias=False)),
("lin_reg", LinearRegression())
])
We fit the model to the sample data.
# Fit model
model.fit(x, y)
Make Predictions: We use the fitted model to make predictions on the input data x
.
# Make predictions
y_pred = model.predict(x)
We evaluate the model’s performance using Mean Squared Error (MSE) and R-squared (R²) score metrics.
# Evaluate the model
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
All codes:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
# Sample data
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1.2, 1.9, 3.1, 4.5, 6.8])
# Define polynomial degree
degree = 2
# Create a pipeline to perform polynomial regression
model = Pipeline([
("poly_features", PolynomialFeatures(degree=degree, include_bias=False)),
("lin_reg", LinearRegression())
])
# Fit model
model.fit(x, y)
# Make predictions
y_pred = model.predict(x)
# Evaluate the model
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
Advantages of Polynomial Regression
- Flexibility: It can model non-linear relationships without requiring a complex transformation or feature engineering.
- Interpretability: Polynomial terms make it easy to understand the influence of various powers of the predictor variable on the outcome.
Disadvantages
- Overfitting: High-degree polynomials can overfit, especially on small datasets or noisy data. See the explaination in A comic guide to model generalization and overfitting.
- Sensitivity: Polynomial regression can be sensitive to outliers, which can distort the model’s behavior.
Polynomial regression can be effective for certain types of data, but it should be used with caution to avoid overfitting.
R implementation
We start by importing necessary libraries in R.
# Load necessary libraries
library(caret)
library(Metrics)
Define the sample data in R:
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(1.2, 1.9, 3.1, 4.5, 6.8)
data <- data.frame(x, y)
Create a polynomial model using caret and lm functions:
# Define polynomial degree
degree <- 2
# Create polynomial features
data$poly_x <- poly(data$x, degree, raw = TRUE)
# Fit the model
model <- lm(y ~ poly_x, data = data)
Use the fitted model to make predictions:
# Make predictions
y_pred <- predict(model, data)
Evaluate the model’s performance using Mean Squared Error (MSE) and R-squared (R²) metrics:
# Calculate Mean Squared Error
mse <- mse(y, y_pred)
# Calculate R^2 Score
r2 <- summary(model)$r.squared
print(paste("Mean Squared Error:", mse))
print(paste("R^2 Score:", r2))
Combined codes:
# Load necessary libraries
library(caret)
library(Metrics)
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(1.2, 1.9, 3.1, 4.5, 6.8)
data <- data.frame(x, y)
# Define polynomial degree
degree <- 2
# Create polynomial features
data$poly_x <- poly(data$x, degree, raw = TRUE)
# Fit the model
model <- lm(y ~ poly_x, data = data)
# Make predictions
y_pred <- predict(model, data)
# Calculate Mean Squared Error
mse <- mse(y, y_pred)
# Calculate R^2 Score
r2 <- summary(model)$r.squared
print(paste("Mean Squared Error:", mse))
print(paste("R^2 Score:", r2))
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.