Skip to main content
HomeBlogMachine Learning

How to Ethically Use Machine Learning to Drive Decisions

Having good quality data requires strong data foundations, along with a commitment to monitoring models and removing bias.
Aug 2020  · 3 min read

Focus on solid data foundations and tooling

Having good quality data is a huge challenge in itself. We recommend companies that want to leverage machine learning, artificial intelligence, and data science to consider Monica Rogati’s AI Hierarchy of Needs, which has machine learning close to the top as one of the final pieces of the puzzle.

Source: Hackernoon

This hierarchy illustrates that before machine learning can happen, you need solid data foundations and tools for extracting, loading, and transforming data (ETL), as well as tools for cleaning and aggregating data from disparate sources.

This requires strong data engineering practices—you’ll need to leverage databases, understand how to process data correctly, schedule your workflows, and make use of cloud computing.

So before you hire your first machine learning engineer, you should first set up your data engineering, data science, and data analysis functions.

Beware of bias in your data and algorithms

Machine learning can only be as good as the data you feed it. If your data is biased, your model will be too. For example, Amazon built a ML recruiting tool to predict the success of applicants based on resumes with ten years’ worth of training data that favored males due to historic male dominance across the tech industry—which caused the ML tool to also be biased against women.

This is why data ethics has emerged as such an important topic in recent years. As more and more data is generated, the impact of how that data is used also scales dramatically. This requires principled consideration and monitoring. As Cassie Kozyrkov, Google's Chief Decision Scientist, has analogized, a teacher is only as good as the books they’re using to teach the students. If the books are biased, their lessons will be too.

Keep tabs on your model and improve it

Remember that the job of machine learning doesn’t end when your model is in production, making predictions, or performing classifications. Models that are deployed and doing work still need to be monitored and maintained.

If you have a model predicting credit card fraud based on transaction data, you get useful information every time your model makes a prediction and you act on it. On top of this, the activity you’re trying to monitor and predict—in this case, credit card fraud—may be dynamic and change over time. This process, where data that’s generated is constantly in flux, is called data drift—and it proves how essential it is to regularly update your model.

Source: DataBricks

Related

Classification vs Clustering in Machine Learning: A Comprehensive Guide

Explore the key differences between Classification and Clustering in machine learning. Understand algorithms, use cases, and which technique to use for your data science project.
Kurtis Pykes 's photo

Kurtis Pykes

12 min

What is Named Entity Recognition (NER)? Methods, Use Cases, and Challenges

Explore the intricacies of Named Entity Recognition (NER), a key component in Natural Language Processing (NLP). Learn about its methods, applications, and challenges, and discover how it's revolutionizing data analysis, customer support, and more.
Abid Ali Awan's photo

Abid Ali Awan

9 min

The Curse of Dimensionality in Machine Learning: Challenges, Impacts, and Solutions

Explore The Curse of Dimensionality in data analysis and machine learning, including its challenges, effects on algorithms, and techniques like PCA, LDA, and t-SNE to combat it.
Abid Ali Awan's photo

Abid Ali Awan

7 min

Machine Learning Engineer Salaries in 2023

Find out how much machine learning engineers make around the world at different career stages. Learn how you can become a top-earning machine learning engineer today.
Natassha Selvaraj's photo

Natassha Selvaraj

16 min

What is Continuous Learning? Revolutionizing Machine Learning & Adaptability

A primer on continuous learning: an evolution of traditional machine learning that incorporates new data without periodic retraining.

Yolanda Ferreiro

7 min

What is Natural Language Processing (NLP)? A Comprehensive Guide for Beginners

Explore the transformative world of Natural Language Processing (NLP) with DataCamp’s comprehensive guide for beginners. Dive into the core components, techniques, applications, and challenges of NLP.
Matt Crabtree's photo

Matt Crabtree

11 min

See MoreSee More