Skip to content

Polynomial regression in Python

Polynomial regression is a form of regression analysis where the relationship between the independent variable x and the dependent variable y is modeled as an n^{th} degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of $latex y). It is a special case of multiple linear regression where the powers of a single predictor variable are used as predictors.

Key Concepts

  1. Polynomial Model:
    The model can be expressed as:
    y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \ldots + \beta_n x^n + \epsilon
    where \beta_0, \beta_1, \ldots, \beta_n are the coefficients and \epsilon is the error term.
  2. Degree of the Polynomial:
    The degree n of the polynomial determines the flexibility of the model. A higher degree allows for a more flexible fit to the data, but it may also lead to overfitting.
  3. Fitting the Model:
    Polynomial regression can be fitted using methods similar to linear regression, typically through least squares estimation.

Steps to Perform Polynomial Regression

  1. Data Preparation:
  • Collect and preprocess the data.
  • Split the data into training and testing sets.
  1. Feature Engineering:
  • Create polynomial features from the original feature(s). For example, if the original feature is x, create new features x^2, x^3, \ldots, x^n.
  1. Model Training:
  • Fit a linear regression model using the polynomial features.
  1. Model Evaluation:
  • Evaluate the model’s performance using metrics such as Mean Squared Error (MSE), R-squared R^2, etc.
  1. Visualization:
  • Plot the original data and the polynomial regression curve to visualize the fit.

Example Using Scikit-Learn

Here is an example of how to perform polynomial regression using Python and Scikit-Learn:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Sample data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(-1, 1)
y = np.array([1, 4, 9, 16, 25, 36, 49, 64, 81])

# Transform the data to include polynomial features
degree = 2
poly_features = PolynomialFeatures(degree=degree)
x_poly = poly_features.fit_transform(x)

# Fit the polynomial regression model
model = LinearRegression()
model.fit(x_poly, y)

# Predict using the model
y_pred = model.predict(x_poly)

# Evaluate the model
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

# Print model performance
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Visualization
plt.scatter(x, y, color='blue')
plt.plot(x, y_pred, color='red')
plt.title('Polynomial Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Applications

  • Economics: Modeling the relationship between GDP and investment.
  • Biology: Modeling the growth of a population over time.
  • Engineering: Modeling the stress-strain relationship in materials.
  • Finance: Predicting the price of financial instruments based on various factors.
See also  Ridge regression: method & R codes

Polynomial regression provides a flexible approach to modeling complex relationships, but it is crucial to select an appropriate degree for the polynomial to balance bias and variance in the model.

Leave a Reply

error: Content is protected !!