Skip to main content

Why It’s Okay If Your Model Isn’t the Most Accurate

Optimizing for accuracy can actually cause more harm than good. You must choose your evaluation metric with care.
Aug 2020  · 3 min read

Accuracy in machine learning

When building a ML model, you must designate an evaluation metric, which tells the algorithm what you’re optimizing for. One commonly used evaluation metric is accuracy, that is, what percentage of your data your model makes the correct prediction for. This may seem like a great choice: who would want a model that isn’t the most accurate?

Actually, there are many cases where you wouldn't want to optimize for accuracy—the most prevalent being when your data has imbalanced classes. Say you’re building a spam filter to classify emails as spam or not, and only 1% of emails are actually spam (this is what is meant by imbalanced classes: 1% of the data is spam, 99% is not). Then a model that classifies all emails as non-spam has an accuracy of 99%, which sounds great, but is a meaningless model.

Alternative metrics: the confusion matrix

There are alternative metrics that account for such class imbalances. It is key that you speak with your data scientists about what they’re optimizing for and how it relates to your business question. A good place to start these discussions is not by focusing on a single metric but by looking at what’s called the confusion matrix of the model, which contains the following numbers:

  • False negatives (e.g., real spam incorrectly classified as non-spam)
  • False positives (non-spam incorrectly classified as spam)
  • True negatives (non-spam correctly classified)
  • True positives (spam correctly classified)

Source: Glass Box Medicine

The Goodhart's law: real-life examples

A lot of attention is currently focused on the importance of the data you feed your ML models and how it relates to your evaluation metric. YouTube had to learn this the hard way: When they optimized for revenue based on view time (how long people stay glued to videos), this had the negative effect of recommending more violent and incendiary content, along with more conspiracy videos and fake news.

An interesting lesson here is that optimizing for revenue—since viewing time is correlated with the number of ads YouTube can serve you, and thus, revenue—may not be aligned with other goals, such as showing truthful content. This is an algorithmic version of Goodhart’s Law, which states: “When a measure becomes a target, it ceases to be a good measure."

The most well-known example is a Soviet nail factory, in which the workers were first given a target of a number of nails and produced many small nails. To counter this, the target was altered to the total weight of the nails, so they then made a few giant nails. But algorithms also fall prey to Goodhart’s law, as we’ve seen with the YouTube recommendation system.

Learn more about best practices for machine learning

Find out more about best practices for machine learning in The Definitive Guide to Machine Learning for Business Leaders.


A Beginner's Guide to GPT-3

GPT-3 is transforming the way businesses leverage AI to empower their existing products and build the next generation of products and software.
Sandra Kublik's photo

Sandra Kublik

25 min

How NLP is Changing the Future of Data Science

With the rise of large language models like GPT-3, NLP is producing awe-inspiring results. In this article, we discuss how NLP is driving the future of data science and machine learning, its future applications, risks, and how to mitigate them.
Travis Tang 's photo

Travis Tang

19 min

Essential Takeaways From Stanford’s AI Index 2022 Report

This article summarizes Stanford’s HAI’s 2022 AI Index Report. The report, built by leading experts in academia and business, tries to cover the advancement of AI over the last year in detail.
Anuj Syal's photo

Anuj Syal

12 min

Top 10 Deep Learning Books to Read in 2022

Deep learning is a highly disruptive field. Here’s our list of essential reads to expand your knowledge and take your skills to the next level.
Javier Canales Luna 's photo

Javier Canales Luna

Gradient Descent Tutorial

Learn how gradient descent works and how to implement it.
DataCamp Team's photo

DataCamp Team

16 min

Understanding Text Classification in Python

Discover what text classification is, how it works, and successful use cases. Explore end-to-end examples of how to build a text preprocessing pipeline followed by a text classification model in Python.
Moez Ali 's photo

Moez Ali

12 min

See MoreSee More