Understand Linear Regression: The ULTIMATE Beginner’s Guide

Kunal Gaurav
5 min readMar 1, 2024

--

Master the fundamentals of Linear Regression in this comprehensive beginner-friendly tutorial!

Introduction

Linear regression is one of the fundamental algorithms in the field of machine learning. Its simplicity and effectiveness make it a popular choice for modelling and analyzing relationships between variables. In this blog post, we will delve into the intricacies of linear regression, exploring its different variants, optimization techniques, and applications.

Here is the YouTube video if you like a video version:

Understand Linear Regression: The ULTIMATE Beginner’s Guide

Linear Machine Learning Algorithms

Linear regression falls under the category of supervised learning algorithms. Linear machine learning algorithms assume a linear relationship between the features (input variables) and the target variable (output variable) we are trying to predict.

Linear regression and logistic regression are the two most common linear machine learning algorithms. Linear regression is used when the target variable is continuous or numerical, whereas logistic regression is used where the target variable is categorical, signifying a classification problem.

Think of it like this:

  • Predicting house prices: This is a job for linear regression because we’re trying to predict the price of a house, which is a numerical or continuous value.
  • Classifying spam emails: This is where logistic regression comes in. It helps us sort incoming emails into two groups: spam and not spam.

Linear Regression

Linear regression is a statistical technique for modelling the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data. The objective is to determine the best-fitting line, also called the regression line.

Linear regression utilizes a linear equation in the form of y = B₀ + B₁x to establish a relationship between the features (often represented by x) and the target (often represented by y) where:

  • y is the dependent variable (the variable we want to predict),
  • x is the independent variable (the variable used to make predictions),
  • B₀​ is the intercept (the point where the regression line crosses the y-axis),
  • B₁​ is the slope (the change in y for a one-unit change in x)

Now let’s get a clear idea of what is a linear regression with the help of an example and graph:

Linear Regression
  • Red circles: These are the actual values we have for speed and weight.
  • Green line: This line shows the values the model predicts for speed based on weight.
  • B₀ (intercept): This is where the green line crosses the y-axis (speed). It represents the predicted speed when the weight is zero.
  • B₁ (slope): Imagine the green line as a ramp. The steeper the ramp (higher slope), the more the speed increases with each unit increase in weight.

Let’s imagine you’re a race car mechanic! You want to guess how fast a car can go based on its features, like how heavy it is. In the world of machine learning, we use special tools called algorithms to make such predictions.

Think of these algorithms like magic rulers. They look at the car’s weight (like markings on the ruler) and estimate its speed based on a straight line they draw on a special “prediction chart”. This line shows how the car’s weight connects to its speed, assuming they change together in a steady, straightway.

Just like your mechanic skills improve with experience, these algorithms learn from data to become better at making these predictions!

Further, depending on the number of input features linear regression can be categorized into:

Simple Linear Regression

Simple linear regression is the most basic form of linear regression, involving only one independent variable. The relationship between the independent and dependent variables is modelled using a straight line equation: y = mx + b, where ‘m’ represents the slope of the line and ‘b’ represents the y-intercept.

Multiple Linear Regression

Multiple linear regression extends simple linear regression to incorporate multiple independent variables. It models the relationship between the dependent variable and two or more independent variables using a linear equation of the form: y = β0 + β1x1 + β2x2 + … + βnxn, where β0 is the intercept, and β1, β2, …, βn are the coefficients of the independent variables.

In the mathematical formula of the linear regression, we try to find the best-fit line to minimize the error. Error is nothing but the difference between the predicted values and the actual values. The goal of the linear regression is to minimize this error using something we call loss or cost function.

The mathematical formula of the linear regression can be written as y = b₀ + b₁x + e, where:
b₀ and b₁ are known as the regression coefficients or parameters:
b₀ is the intercept of the regression line; i.e. the predicted value when x = 0.
b₁ is the slope of the regression line.
e is the error term (also known as the residual errors), in other words, the part of y that could not be explained by the regression model

In linear regression, a loss function serves as a critical tool for evaluating how well a model fits the data and guides the model training process. By minimizing this loss function, the model tries to achieve the best possible alignment with the underlying data. Linear regression tries to fit a straight line to a set of data points. The loss function measures how far each data point falls from the fitted line. Smaller distances indicate better alignment, translating to lower loss values. Conversely, larger distances between the points and the line signify higher loss values, representing poorer model fit. Different loss functions can be used in regression, and the choice depends on factors such as the problem domain, the characteristics of the data, and the desired properties of the model. Some commonly used loss functions are :

Loss (Cost) functions
  • Mean Squared Error (MSE) calculates the average squared difference between the predicted values (y’ᵢ ) and the true values (yᵢ). It penalizes larger errors more heavily due to squaring.
  • Mean Absolute Error (MAE) calculates the average absolute difference between the predicted values and the true values. It provides a more linear penalty compared to MSE and is less sensitive to outliers.
  • Root Mean Squared Error (RMSE) is the square root of MSE and provides a measure of the average magnitude of the errors in the predicted values.
  • Sum of Squared Error (SSE) which is the sum of squares of all the error terms.

The two most commonly used optimisation methods used are:

  • Ordinary least squares regression is the most common and analytical method for simple linear regression. In simple linear regression, the ordinary least squares (OLS) optimization method aims to find the line that best fits a set of data points by minimizing the sum of squared residuals.
  • Gradient Descent provides a more flexible and iterative approach to minimizing the loss function, especially for complex scenarios. It is most helpful in cases: when OLS assumptions are violated (e.g., non-normal errors), when dealing with more complex models (e.g. polynomial regression) or when flexibility in handling different loss functions is needed.

Please post your comments and feedback. Follow me for more such content.

--

--