Skip to content

Polynomial regression in Python

Polynomial regression is a form of regression analysis where the relationship between the independent variable x and the dependent variable y is modeled as an n^{th} degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of $latex y). It is a special case of multiple linear regression where the powers of a single predictor variable are used as predictors.

Key Concepts

  1. Polynomial Model:
    The model can be expressed as:
    y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \ldots + \beta_n x^n + \epsilon
    where \beta_0, \beta_1, \ldots, \beta_n are the coefficients and \epsilon is the error term.
  2. Degree of the Polynomial:
    The degree n of the polynomial determines the flexibility of the model. A higher degree allows for a more flexible fit to the data, but it may also lead to overfitting.
  3. Fitting the Model:
    Polynomial regression can be fitted using methods similar to linear regression, typically through least squares estimation.

Steps to Perform Polynomial Regression

  1. Data Preparation:
  • Collect and preprocess the data.
  • Split the data into training and testing sets.
  1. Feature Engineering:
  • Create polynomial features from the original feature(s). For example, if the original feature is x, create new features x^2, x^3, \ldots, x^n.
  1. Model Training:
  • Fit a linear regression model using the polynomial features.
  1. Model Evaluation:
  • Evaluate the model’s performance using metrics such as Mean Squared Error (MSE), R-squared R^2, etc.
  1. Visualization:
  • Plot the original data and the polynomial regression curve to visualize the fit.

Example Using Scikit-Learn

Here is an example of how to perform polynomial regression using Python and Scikit-Learn:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Sample data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(-1, 1)
y = np.array([1, 4, 9, 16, 25, 36, 49, 64, 81])

# Transform the data to include polynomial features
degree = 2
poly_features = PolynomialFeatures(degree=degree)
x_poly = poly_features.fit_transform(x)

# Fit the polynomial regression model
model = LinearRegression()
model.fit(x_poly, y)

# Predict using the model
y_pred = model.predict(x_poly)

# Evaluate the model
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

# Print model performance
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Visualization
plt.scatter(x, y, color='blue')
plt.plot(x, y_pred, color='red')
plt.title('Polynomial Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Applications

  • Economics: Modeling the relationship between GDP and investment.
  • Biology: Modeling the growth of a population over time.
  • Engineering: Modeling the stress-strain relationship in materials.
  • Finance: Predicting the price of financial instruments based on various factors.
See also  Simple linear regression using train-test split in Python & R

Polynomial regression provides a flexible approach to modeling complex relationships, but it is crucial to select an appropriate degree for the polynomial to balance bias and variance in the model.


Discover more from Knowledge sparks

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!