Intuition behind Linear regression

Ampatishan Sivalingam
4 min readJan 28, 2022

How does the simple machine learning algorithm works?

Linear regression the simplest machine learning algorithm available, even though it is the simplest, in certain use cases it would produce the best results. Inorder to get a full picture regarding machine learning algorithms, we need to start from linear regression. In this article we will first see how linear regression works, then issues with regard to linear regression and then how the issues can be overcome.

The hypothesis function of linear regression will be a linear function, that means we try to estimate the pattern in the dataset given by a linear function, assuming the pattern to be linear. For example if we have features of human, the height, weight, age and gender, where we need to find the height given the remaining features.

The equation f(x) above is sample of a hypothesis function for the linear regression model were the w’s are the learnt weights. The equation above can be written in compact form as shown below

Now we have the formula to calculate the height of person given other input features. But the weights w of the function is a variable, where we have to find out the values of w’s that would help us to estimate the f(x) as close as possible to the Y (real_height). That means our final target is to reduce the discripancy between f(x) and Y. If we consider mean squared error then we need to reduce

In the optimization equation above, the variable will be the w’s. In order to get the minimum, the typical step is to different the function with respect to w and equate it zero.

So, the value of w that will minimize the discrepancy between Y and f(x) is given above. When ever we are to model a linear regression for a given dataset , using mean squared error as the loss function, then we can simple subistitute the optimum value for w. So that’s all for the linear regression.

But there is an issue in the optimal value for ‘w’, we can see there is an inverse of a matrix, not all the matrix has inverse so what if that particular matrix don’t have an inverse then can’t we find an optimal value for the w ?

In order to solve this issue we need learn something regarding the inverses of a matrix. A matrix is said to be invertible if the rank of the matrix X is equal to the dimension of the matrix d. In casual terms if the columns in a matrix is not the linear combination of other columns present in the matrix, then the matrix will be invertible. The inverse of a matrix is defined as

In the equation above lambda represent the eigen values, while ‘v’ represent eigen vectors. Incase of a matrices with rank d, all of the eigen values will be non-zero, so we can calculate the inverse of the matrix. But for matrices with rank less than d, atleast one of the eigen values will be zero, so the inverse of those matrices can’t be defined as the per the equation above. There is a workaround for the matrices with rank less than d, that is when calculating the inverse from the equation given above, we will omit the eigen values having zero, and calculate the inverse of those matrices, this is called as the pseudo inverse of that particular matrix. So, when dealing with linear regression we will use pseudo inverse of matrix for the matrices without an actual inverse.

--

--

Ampatishan Sivalingam

Data Scientist | Machine learning enthusiast | Electronic and telecommunication engineer