Blog

Data Demystified: The Different Types of AI Bias

In the final part of data demystified, we outline the most common types of AI bias, and why data literacy helps avoid harmful impacts from AI.

Updated Sep 2022 · 8 min read

Welcome to the final entry of our month-long data demystified series. As part of Data Literacy Month, this series clarified key concepts from the world of data and attempted to answer the questions you may be too afraid to ask. If you want to start at the beginning, read our first entry in the series: What is a Dataset?

In this entry, we will continue on the theme from the previous entry of data demystified, and discuss the potentially harmful effects of AI, how it can perpetuate bias against certain groups of people, and the different types of AI bias everyone should be aware of.

The Problem with AI Bias

Most AI systems today leverage machine learning. Machine learning, by definition, applies advanced statistical techniques to learn the pattern from past data and make predictions for future events.

The widespread adoption of machine learning has led to a steep increase in cases where it has made a biased prediction. Biased AI algorithms have been a serious concern in the AI community and are a product of the data used for model training. Bias can manifest in many forms—it could be societal or structural and can exist towards a particular gender, color, religion, or nationality.

Consequently, AI algorithms learn bias from the training data as they try to mimic human judgment. Let us revisit a few examples from the past where biased AI predictions have negatively impacted society and humanity as a whole:

Gender Bias: Amazon’s Recruiting Engine

Amazon developed a recruiting engine to automate job applicants' resume screening for further interviews. However, the algorithm reflected the bias it learned from the past data and ended up choosing the profiles of male candidates only.

Racial bias: PredPol Algorithm

PredPol, or Predictive Policing, built a heatmap of areas of high criminal activity and identified minority-specific locations as hot zones. The algorithm was trained on the biased input data that consisted of several criminal incidents reported from such areas.

Racial bias: COMPAS Algorithm

Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software was used to assess the likelihood of a repeat offense by a criminal. However, the algorithm was biased as part of an investigation in 2016. The software labeled black criminals more likely to re-offend than white criminals.

As witnessed from the above examples, machine learning algorithms learn bias from the training data in addition to other data regularities. Unless treated at the origin, bias can manifest in AI/ML pipelines in multiple forms. As AI becomes more widespread in organizations and society, everyone should be aware of AI systems' different types of biases. Below are three of the most common types of bias in AI.

Three Common Types of AI Bias

Prejudice bias

When the training data reflects existing prejudices, stereotypes, and societal assumptions, these biases get embedded in the learned model; such a bias is called a Prejudice bias. For example, when you search for “doctor,” the search result comprises many male doctors' images. In contrast, a similar search for “nurse” results in female nurse images. This speaks volumes about societal gender-based stereotypes.

Sample Selection Bias

Sample selection bias occurs when the training data is not representative of the population under study. An example could be here AI systems trained to detect skin cancer. If the original dataset is not representative of the wider population, this system will underperform for members of an underrepresented group in the dataset.

Measurement Bias

Measurement bias comes from an error in the data collection or measurement process. For example, if images from a camera used in supplying data for an image recognition system are of poor quality, this could lead to biased results against specific populations. Another example can come from human judgment. For example, a medical diagnostic algorithm can be trained to predict the likelihood of sickness based on proxy metrics such as doctor visits instead of actual symptoms.

Developing Data Literacy for Responsible AI

Throughout this month, we’ve highlighted the importance of data literacy for individuals and organizations. Data literacy allows non-technical stakeholders to become conversational with data and AI experts and understand AI systems' limitations. More importantly, it promotes a two-way conversation between subject matter experts and AI experts that allow for a thoughtful discussion on the potential harm of an AI system.

To equip yourself with the necessary knowledge to have these conversations, take our Understanding Machine Learning course and start your data literacy journey. For more data literacy and data demystified content, check out the following resources:

Topics

Data Literacy

Data Literacy Courses

Course

Understanding Machine Learning

2 hr

188K

An introduction to machine learning with no coding involved.

See Details

Start Course

Course

Data-Driven Decision Making for Business

2 hr

17.7K

Discover how to make better business decisions by applying practical data frameworks—no coding required.

See Details

Start Course

Course

Data Communication Concepts

3 hr

39.9K

No one enjoys looking at spreadsheets! Bring your data to life. Improve your presentation and learn how to translate technical data into actionable insights.

See Details

Start Course

Data Competency Framework: Templates and Key Skills

Discover how to build an effective data competency framework, the data and AI skills you need to include, and templates to help you get started.

Adel Nehme

8 min

Digital Upskilling Strategies for Transformative Success

Explore the power of digital upskilling in achieving transformative success and bridging the skills gap for a future-ready workforce.

Adel Nehme

7 min

What is Data Fluency? A Complete Guide With Resources

Discover what data fluency is and why it matters. Plus find resources and tips for boosting data fluency at an individual and organizational level.

Matt Crabtree

8 min

Making SMARTER™️ Decisions with Lori Silverman, author of Business Storytelling for Dummies

Richie and Lori cover common problems in business decision-making, connecting decision-making to business processes, the role of data visualization and narrative storytelling, the SMARTER™️ decision-making methodology and much more.

Richie Cotton

62 min

How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

Adel and Saurabh explore the importance of data quality and how ‘shifting left’ can improve data quality practices, operationalizing ‘shift left’ strategies through collaboration and data governance, future trends in data quality and governance, and more.

Adel Nehme

41 min

[Radar Recap] The Art of Data Storytelling: Driving Impact with Analytics with Brent Dykes, Lea Pica and Andy Cotgreave

Brent, Lea and Andy shed light on the art of blending analytics with storytelling, a key to making data-driven insights both understandable and influential within any organization.

Richie Cotton

40 min

See More See More

The Problem with AI Bias

Gender Bias: Amazon’s Recruiting Engine

Racial bias: PredPol Algorithm

Racial bias: COMPAS Algorithm

Three Common Types of AI Bias

Prejudice bias

Sample Selection Bias

Measurement Bias

Developing Data Literacy for Responsible AI

Data Competency Framework: Templates and Key Skills

Digital Upskilling Strategies for Transformative Success

What is Data Fluency? A Complete Guide With Resources

Making SMARTER™️ Decisions with Lori Silverman, author of Business Storytelling for Dummies

How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

[Radar Recap] The Art of Data Storytelling: Driving Impact with Analytics with Brent Dykes, Lea Pica and Andy Cotgreave

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Understanding Machine Learning

Data-Driven Decision Making for Business

Data Communication Concepts

Data Competency Framework: Templates and Key Skills

Digital Upskilling Strategies for Transformative Success

What is Data Fluency? A Complete Guide With Resources

Making SMARTER™️ Decisions with Lori Silverman, author of Business Storytelling for Dummies

How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

[Radar Recap] The Art of Data Storytelling: Driving Impact with Analytics with Brent Dykes, Lea Pica and Andy Cotgreave

Understanding Machine Learning