Skip to main content

Overfitting vs. Underfitting: A Practical Guide to Model Diagnostics

A detailed walkthrough of overfitting and underfitting in machine learning, including how to identify each failure mode, why it happens, and how to fix it through the bias-variance tradeoff.
Jun 12, 2026  · 12 min read

Do you know why your model has 99% accuracy on training data but can’t seem to predict one thing right in production?

There’s a difference between a model that memorized and a model that learned. Generalization is the whole point of machine learning - you want predictions that hold up on data the model has never seen, not just the data you used during training. When that’s not the case, it almost always goes in one of two directions.

Those two directions are overfitting and underfitting. You need to know which one you're dealing with before you can fix it.

In this article, I'll walk you through how to recognize overfitting and underfitting, why they happen, and the hands-on steps that will help you achieve balance.

What Is Underfitting?

Underfitting happens when your model is too simple to represent what's actually going on in the data.

Imagine trying to predict housing prices with a single rule: "every house costs $300,000." That rule will be wrong almost everywhere. It can't see neighborhoods, square footage, number of bedrooms, garage space, or year built. The model has nowhere near enough flexibility to follow the pattern.

You can spot underfitting the same way every time. Training accuracy is low, and test accuracy is low too. Both numbers are bad, but the key thing is that they're bad together.

A classic case is fitting a straight line to data that curves. The line cuts through the middle and misses the shape. No amount of extra training data will save it, as the model itself can't represent the relationship.

Underfitting example

Underfitting example

What Is Overfitting?

Overfitting is the opposite problem. The model is too complex.

Instead of learning the general pattern, it memorizes the training set. Every noise point, every weird outlier, every peak and valley, every coincidence in the data gets attention as if it were an actual pattern. The model becomes near-perfect for the data it was trained on.

The good thing about overfitting is that you can easily spot it. Training accuracy looks great, but test accuracy is horrible.

Think of a student who memorizes exam answers word for word but never learns the underlying material. They score well on practice test and likely fail on the real one.

Overfitting example

Overfitting example

Overfitting vs Underfitting: Key Differences

Now that you've seen both, the difference is easier to spot. Underfitting models can't perform on data they've seen. Overfitting models can't perform on data they haven't.

The two look different during training:

  • Underfitting shows up as flat, mediocre performance across the board - the model never learns much from anything
  • Overfitting shows up as a gap, where training scores keep increasing while test scores stall or get worse over time

Their causes mirror each other too. Underfitting comes from doing too little: simple models and missing features. Overfitting comes from doing too much: complex models and too many features.

Here's a recap of the two:

Underfitting compared to overfitting

Underfitting compared to overfitting

How to Identify Overfitting and Underfitting

Knowing what underfitting and overfitting look like in theory is one thing, but catching them in your own models is another.

The easiest thing to do here is compare training error to test error, and look at learning curves.

Training vs test error

The fastest check is to split your data into a training set and a test set, train the model, and look at the error on each.

For underfitting, both errors will be high. The model didn't learn the training data well, and it's not going to perform any better on data it's never seen. You have the same poor result on both sides.

For overfitting, the training error will be very low while the test error stays high. The model has the training data memorized, but that knowledge doesn't transfer.

Training vs test error visualization

Training vs test error visualization

You want to analyze the gap between these two numbers. A small gap with high errors points to underfitting. A large gap with low training error and high test error points to overfitting. A small gap with low errors on both is the goal, as it means the model learned the actual data representation.

Learning curves

Learning curves plot training and validation error against the size of the training set, or against training iterations. They show what's happening as the model learns.

In an underfit model, both curves quickly flatten out at a high error. Adding more data doesn't help because the model can't represent the pattern in the first place. Both curves stay high.

Underfit model curves

Underfit model curves

In an overfit model, the training curve drops to near zero while the validation curve stays high. The gap between them widens as training continues. That growing gap is what overfitting looks like on a chart.

Overfit model curves

Overfit model curves

A healthy model shows both curves dropping and meeting at a low error, with a small gap between them.

Why Overfitting and Underfitting Happen

Once you know how to spot them, the next question is why they show up. Both come from a mismatch between the model and the problem, but in opposite directions.

Causes of underfitting

Underfitting almost always traces back to one of three things.

  • The model is too simple: A linear model can't represent a curved relationship. The model doesn't have the capacity that the problem actually needs.
  • The features are insufficient: Even a capable model will underfit if you give it the wrong inputs. Predicting house prices from zip code doesn’t make sense, and it misses square footage, bedrooms, condition, age, and lot size. The model has nothing useful to work with.
  • Not enough training: The model didn't have enough depth, enough iterations, enough epochs, or enough learning rate adjustments to reach a good solution. The training just stopped too early.

Causes of overfitting

Overfitting comes from giving the model more freedom than the data needs.

  • The model is too complex: A deep neural network with millions of parameters trained on a tiny dataset has plenty of room to memorize. The capacity exceeds what the problem needs.
  • Too many features: When you have more features than meaningful patterns in the data, the model learns correlations that happen to exist in your training sample but don't generalize.
  • The dataset is too small: With limited training data, even moderate model complexity can memorize the entire set. There aren't enough examples for the model to generalize from.
  • Training ran too long: The model kept adjusting weights after it had already learned the actual pattern, and started fitting the noise instead. From that point, more training makes things worse.

The Bias-Variance Tradeoff

The bias-variance tradeoff explains why model performance is about finding the sweet spot where your model generalizes well to unseen data without oversimplifying the problem or overfitting to your training set.

High bias

Bias is the error that comes from a model's assumptions about the data. A high-bias model has strong, simplistic assumptions. It can't represent the actual complexity of what's going on in the data.

This is exactly what underfitting is. The model is too rigid to fit the patterns, so it produces predictions that are off, no matter how much data you give it.

If you train a high-bias model 100 times on different samples, all 100 versions will make similar mistakes. Their predictions cluster around the wrong answer.

High variance

Variance is the error that comes from a model being too sensitive to the specific data it was trained on. A high-variance model picks up every small pattern, usually the noise.

This is what overfitting is. The model fits the training set very closely, but small changes in the training data lead to very different predictions.

If you train a high-variance model 100 times on different samples and you'll get 100 very different models. Their predictions are all over the place, even on the same input.

The tradeoff

You can't fully eliminate either bias or variance, you can only shift between them.

Reduce bias by making the model more complex, and variance goes up. Reduce variance by simplifying the model, and bias goes up. The goal is to find the middle, where total error is at its lowest.

Bias-variance tradeoff example

Bias-variance tradeoff example

How to Fix Underfitting

Once you've diagnosed underfitting, you have a few ways to fix it. They all give the model more capacity to represent the patterns in your data.

  • Increase model complexity: Move to a more flexible model. Go from linear regression to polynomial regression, or a shallow tree to deeper one.
  • Add more features: add new inputs that actually have some value. Create interaction terms, polynomial features, or domain-specific features the model didn't have access to.
  • Train for longer: The model might not have had enough time to converge. Give it more epochs or a different learning rate schedule.
  • Reduce regularization: Regularization keeps a model simple, which is the opposite of what underfitting needs. Lower the regularization strength or remove it completely, to give the model more freedom.

A few good features often have more impact than switching architectures. Start there before changing the model itself.

How to Fix Overfitting

Fixing overfitting takes the opposite approach. You want to constrain the model so it stops memorizing the training data.

  • Collect more data: A larger dataset makes it much harder for the model to memorize. More examples force it to find patterns that hold across the entire set, not just a handful of rows.
  • Apply regularization: L1 and L2 regularization add a penalty for large weights, which keeps the model from leaning too heavily on any single feature. This is one of the most reliable fixes.
  • Reduce model complexity: If the model is too big for the data, scale it down. Use fewer parameters, shallower trees, or smaller networks.
  • Use cross-validation: Cross-validation gives you a more honest read on how the model will perform on unseen data. It gives you more training-test splits from a single dataset.
  • Apply dropout for neural networks: Dropout randomly disables a percentage of neurons during training. This forces the network to learn redundant representations, which reduces reliance on any single neuron.
  • Stop training early: Watch the validation error and stop training when it starts increasing, even if the training error is still decreasing. This is called early stopping, and it's one of the easiest changes to implement.

Regularization and early stopping are usually the first things to try. They cost nothing and almost always help.

Overfitting and Underfitting in Different Models

Different model families show underfitting and overfitting in their own ways. Here's how three common ones can fail in both directions.

Linear models

  • Underfit: Linear models assume a straight-line relationship. When the actual pattern curves, the model can't follow it, no matter how much data you give it.
  • Overfit: When you add enough polynomial or interaction terms, even linear regression can memorize noise. Regularization methods like Ridge and Lasso exist mostly to handle this.

Decision trees

  • Underfit: A shallow tree can only make a few splits. With two or three decisions, it can't represent patterns that need more nuance.
  • Overfit: Deep trees tend to overfit. A tree that keeps splitting until each leaf contains a single training example will get perfect training accuracy and poor test accuracy. That's why parameters like max_depth, min_samples_split, and pruning exist.

Neural networks

  • Underfit: Networks that are too small for the problem will underfit. So will networks where training stops too early, or where the optimizer gets stuck in a suboptimal solution.
  • Overfit: This is more common in deep learning. A network with millions of parameters can memorize even large datasets given enough epochs. Dropout, weight decay, data augmentation, and early stopping all exist to prevent it.

Additional Examples of Overfitting vs Underfitting

I’ll now walk you through two classic examples with code that will make these patterns easy to see.

Polynomial regression

A noisy sine wave is a good test case. When you fit polynomials of different degrees, you can see the model behavior change.

import numpy as np

# Data
np.random.seed(7)
X = np.linspace(0, 1, 30)
y_true = np.sin(2 * np.pi * X)
y = y_true + np.random.normal(0, 0.2, X.shape)

# Fit polynomials of three degrees
X_smooth = np.linspace(0, 1, 300)
degrees = [1, 3, 15]

for degree in degrees:
    coefs = np.polyfit(X, y, deg=degree)
    y_pred = np.polyval(coefs, X_smooth)

Polynomial regression example

Polynomial regression example

Degree 1 is a straight line which underfits. It can't follow the curve at all. Degree 3 represents the actual shape. It absorbs some noise but stays close to the truth. Degree 15 overfits as it weaves through every training point and produces huge oscillations between them.

Decision trees with varying depth

The same story shows up with decision trees. You can train trees of increasing depth on the same data and measure error on both training and test sets.

import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Data
np.random.seed(11)
X = np.linspace(0, 10, 250).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.3, 250)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0
)

depths = range(1, 21)
train_errors = []
test_errors = []

for depth in depths:
    tree = DecisionTreeRegressor(max_depth=depth, random_state=0)
    tree.fit(X_train, y_train)
    train_errors.append(mean_squared_error(y_train, tree.predict(X_train)))
    test_errors.append(mean_squared_error(y_test, tree.predict(X_test)))

Decision tree example

Decision tree example

Training error reduces as the tree grows deeper, eventually approaching zero when each leaf contains just a single training point. Test error drops initially as the tree captures the actual relationships in the data, then climbs back up as deeper splits start fitting noise. The minimum sits at the depth that balances the two.

Common Mistakes When Diagnosing Model Performance

Even if you choose the right metric, it's easy to draw the wrong conclusions. Here are things you shouldn’t do when evaluating model performance:

  • Evaluating only training accuracy: Training accuracy tells you how well the model fits data it has already seen. It doesn't tell you anything about how the model will perform on new inputs. Always measure on a separate set before drawing any conclusions.
  • Ignoring validation data: Validation data is what you use to tune model choices like architecture, hyperparameters, and stopping point. The model will overfit both the training set and the test set you've used too many times.
  • Assuming more complexity is always better: A bigger model doesn't automatically mean a more capable model. If your data is small or the relationships in the data are simple, complexity will only decrease performance. Start simple and only add capacity when the diagnostics call for it.
  • Confusing noise with signal: Not every pattern in the training data is worth learning. Random fluctuations, sampling biases, outliers, and collection artifacts can look meaningful to a flexible model. If you can't explain why a relationship should exist, treat it with caution.

You should always check for all four before settling with a model. Most production failures are related to one (or more) of them.

Conclusion

Underfitting and overfitting are the two ways a model fails to generalize. One stays too simple to learn the pattern. The other tries to learn every point in your dataset.

The actual goal of training is to get somewhere between them, where bias and variance are balanced and total error is at its lowest.

Validation performance is the metric that tells you where you are. Keep track of it during training and let the difference between training and validation error guide your decisions. If validation error stops improving while training error keeps dropping, you've gone past the sweet spot. If both stay high, you haven't reached it yet.

Looking to learn more advanced data science concepts and get job-ready in 2026? Enroll in our Machine Learning Engineer track to go from basics to MLOps.


Dario Radečić's photo
Author
Dario Radečić
LinkedIn
Senior Data Scientist based in Croatia. Top Tech Writer with over 700 articles published, generating more than 10M views. Book Author of Machine Learning Automation with TPOT.

FAQs

What is the difference between overfitting and underfitting?

Underfitting happens when a model is too simple to represent the patterns in your data, so it performs poorly on both training and test sets. Overfitting is the opposite: the model learns the training data too well, including the noise, so it performs great in training but fails on new data. Both produce weak predictions, but for different reasons.

How do I know if my model is overfitting or underfitting?

Compare training error to test error. If both are high, you're underfitting. If training error is very low but test error is high, you're overfitting. Learning curves help too, since training and validation errors diverge in an overfit model and stay flat at a high error in an underfit one.

What is the bias-variance tradeoff?

Bias is the error from a model being too simple, and variance is the error from a model being too sensitive to its training data. Reducing one usually increases the other, so the goal is to find the balance where total error is at its lowest. Models with the best balance generalize best to new data.

Does collecting more data fix overfitting?

It usually helps, but it's not a guaranteed fix. More data makes it harder for a model to memorize, so it has to find patterns that hold across the whole set. But if your model is far too complex for the problem, or your features carry mostly noise, more data won't fully solve it. Regularization and simpler models often work better in those cases.

Can I use early stopping to prevent overfitting in neural networks?

Yes, and it's one of the easiest fixes to implement. Watch validation error during training and stop when it plateaus or starts increasing, even if training error keeps decreasing. This catches the point where the model has learned the actual pattern and is starting to fit noise. Most deep learning frameworks have early stopping callbacks built in.

Topics

Learn with DataCamp

Course

Feature Engineering for Machine Learning in Python

4 hr
38.8K
Create new features to improve the performance of your Machine Learning models.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

What is Overfitting?

Learn the causes and effects of overfitting in machine learning, and how to address it to create models that can generalize well to new data.
Abid Ali Awan's photo

Abid Ali Awan

5 min

Tutorial

What is Underfitting? How to Detect and Overcome High Bias in ML Models

Explore what underfitting is, how to diagnose an underfitting model, and discover actionable strategies on how to fix underfitting, ensuring your models accurately capture data patterns and deliver reliable predictions.

Rajesh Kumar

Tutorial

Bias-Variance Tradeoff: How Models Fail in Production

See how increasing model complexity reduces bias but increases variance, creating an unavoidable tension between underfitting and overfitting that determines whether your model generalizes to new data.
Dario Radečić's photo

Dario Radečić

Tutorial

Towards Preventing Overfitting in Machine Learning: Regularization

Learn the basics of Regularization and how it helps to prevent Overfitting.
Sayak Paul's photo

Sayak Paul

Tutorial

Common Data Science Pitfalls & How to Avoid them!

In this tutorial, you'll learn about some pitfalls you might experience when working on data science projects "in the wild".
DataCamp Team's photo

DataCamp Team

Tutorial

Sensitivity and Specificity: A Complete Guide

Learn to distinguish sensitivity and specificity, and appropriate use cases for each. Includes practical examples.
Mark Pedigo's photo

Mark Pedigo

See MoreSee More