Bias and Variance

Ampatishan Sivalingam
4 min readJan 22, 2022

Behind the scenes of model training

When trying to fit a machine learning model to a data, we would have faced scenarios where the model will work fine on the training data, but will fail miserably on the test data. In machine learning terms we will call that overfitting and it will be specified by using terms like bias and variance. Though most of us has got the understanding of what overfitting means, we still don’t get the full picture of what is happening in the background. So, here I will give a clear picture of what is happening behind the scenes of bias and variance.

First lets talk about target function, What is a target function? In any given dataset we will have a form [(x1,y1),(x2,y2)…………….(xn,yn)], where x will be the supposed input feature and y will be the corresponding output value. The dataset that we have will be derived from a real world data, where there will be a function that would have resulted in y for the input values of x. For example in case of house price prediction, there is an unknown function that will determine the price of the house based on several features like size of the house, number of bedroom. geolocation of house etc. But like I mentioned no ones knows that function. If we already know that function, then there is no need for machine learning here. That unknown function that determines the price of the house based on the features of house if known as the target function. The objective of the machine learning model will be to find out the target function.

Can a machine learning model figure out the real target function? In short the answer is no. If we are to elaborate it first we will consider a simple example where the data set that we have in hand [(2,1),(3,0),(4,1)]. If we train our model with this data, the common patterns that are available for the model to train are, if the input is divisible by 2 without any remainder the output should be one , if the input is a power of 2 then the output should be one, output should be one of every other put etc. These are some of the possible pattern that can be identified by the model. For example if the model if model finds the pattern that for the numbers divisible by two without any remainder the output should be one, hence we don’t know the target function, target function can be for the power of 2 it returns 1. In that case the model will perform well in the training set but will fail in the testing set. Not only for that particular pattern, but for any pattern that we will choose, since we don’t know the target function, we won’t be able to predict correctly for all the outputs. So, for the question asked in the starting of this paragraph, can a machine learning model figure out the real target function, the answer is no, the best that we can do is to try to generate a function that will be able to estimate the output of the target function in most scenarios.

If we can’t find the real target function by using machine learning model, then we can’t achieve an error value of zero. Even before starting to train a model we have encountered a loss that can’t be vanished out. This error is known as the bias. Bias tells us how much close we have gotten to the unknown target function.

The figure above depicts the bias of a model, the eclipse is the possible estimates for the given dataset, where the red point denotes the estimate by our machine learning model, while the green point denotes the unknown target function which has produced the dataset.

In model training part, the model will choose a function estimate for the target function based on the values of its parameters, for different dataset generated by the same unknown target function we will get different function estimates. By averaging out all those function estimates from different datasets we can identify a function estimate, that will have least error and the least error is the value of bias.

Now we have a better idea of bias. In the previous paragraph I have mentioned that different datasets will have different function estimates for the target function, that means in the figure shown above there should be other estimates as well.

In the image above the orange circle show the function estimate corresponding to the bias, and the white circles are the functions estimates generated by other datasets. So, now after training we have different estimate functions and corresponding errors, which are greater than or equal to the bias value. For example consider the purple circle having an error value of E and the bias value being B, then the error that have occurred not because of bias is known as variance.

I am willing covering the how these bias and variance will play a key role in overfitting and underfitting in the next part.

--

--

Ampatishan Sivalingam

Data Scientist | Machine learning enthusiast | Electronic and telecommunication engineer