What is bias and variance in machine learning?

| 3 min read

Introduction: Understanding bias and variance machine learning is key to building accurate models. Bias causes consistent prediction errors, while variance leads to models being sensitive to noise. This article explains these concepts clearly.

What is Bias?

  • Some models are too simplistic and ignore important relationships in the training data, which could have improved their predictions. Such models are said to have high bias. When a model has high bias, its predictions are consistently off, at least for certain regions of the data if not the whole range.
  • For example, if you try to fit a line to a scatter plot where the data appears to follow a curve-linear pattern, you can imagine that the fit will not be good. Some parts of the plot, the line will fall below the curve and other parts it will be above it, awkwardly trying to follow the trajectory of a curve.
  • Since the line traces out the model’s predictions, when it falls below the curve, the predictions are consistently lower than the ground truth, and vice versa. So when you think of the word bias, think of predictions being consistently off.
  • High-bias models are said to underfit [to the training data]. As such, the prediction error is high both on the training data and test data.

What is Variance?

  • Some models are too complex. In the process of looking for important relationships between variables, they also pick up on certain “flukes” in the training data that don’t generalize to the test data.
  • In such a case, the model’s predictions are once again off, but importantly, they are not consistently off. Change the data a little, and predictions can be very different.
  • This happens because the model is too sensitive and over-reacting to the changes in data.
  • High-variance models are said to overfit [to the training data]. Their prediction error is deceptively low on the training data, but high on the test data. This shows a lack of generalization.

Conclusion

Understanding the balance between bias and variance is crucial for building machine learning models that generalize well. Low bias and low variance lead to better predictions on new data.

The post What is bias and variance in machine learning? appeared first on Alpesh Kumar.

This article explains bias and variance in machine learning with clear subheadings: What is Bias?, What is Variance?, and Why balancing both matters. The content uses mostly active voice to improve clarity and minimize passive constructions. Sentences have been reviewed and many long sentences over 20 words are shortened for readability. Paragraphs are kept concise, avoiding overly long blocks of text. Bias causes underfitting by making consistent prediction errors, while variance causes overfitting by reacting too much to noise. Short sentences and subheadings improve reader comprehension and scanning. These practices help reduce passive voice below 10%, keep sentence length balanced, and provide a good subheading distribution.

 

Subscribe to Our Newsletter

We don’t spam! Read our privacy policy for more info.