What is bias and variance in machine learning?

Introduction: Understanding bias and variance machine learning is key to building accurate models. Bias causes consistent prediction errors, while variance leads to models being sensitive to noise. This article explains these concepts clearly.
What is Bias?
- Some models are too simplistic and ignore important relationships in the training data, which could have improved their predictions. Such models are said to have high bias. When a model has high bias, its predictions are consistently off, at least for certain regions of the data if not the whole range.
- For example, if you try to fit a line to a scatter plot where the data appears to follow a curve-linear pattern, you can imagine that the fit will not be good. Some parts of the plot, the line will fall below the curve and other parts it will be above it, awkwardly trying to follow the trajectory of a curve.
- Since the line traces out the model’s predictions, when it falls below the curve, the predictions are consistently lower than the ground truth, and vice versa. So when you think of the word bias, think of predictions being consistently off.
- High-bias models are said to underfit [to the training data]. As such, the prediction error is high both on the training data and test data.
What is Variance?
- Some models are too complex. In the process of looking for important relationships between variables, they also pick up on certain “flukes” in the training data that don’t generalize to the test data.
- In such a case, the model’s predictions are once again off, but importantly, they are not consistently off. Change the data a little, and predictions can be very different.
- This happens because the model is too sensitive and over-reacting to the changes in data.
- High-variance models are said to overfit [to the training data]. Their prediction error is deceptively low on the training data, but high on the test data. This shows a lack of generalization.
Conclusion
Understanding the balance between bias and variance is crucial for building machine learning models that generalize well. Low bias and low variance lead to better predictions on new data.
The post What is bias and variance in machine learning? appeared first on Alpesh Kumar.