An Introduction to Data Ethics: What is the Ethical Use of Data?
What is Data Ethics?
In short, data ethics refers to the principles behind how organizations gather, protect, and use data. It’s a field of ethics that focuses on the moral obligations that entities have (or should have) when collecting and disseminating information about us. In a world where data is more valuable and ubiquitous than ever, data ethics issues are more pressing now than at any time in history.
Why Data Ethics Matter
Let’s look at some examples of the importance of data ethics:
- In September 2018, hackers injected malicious code into British Airways’ website, diverting traffic to a fraudulent replica site. Customers then unknowingly gave their information to fraudsters, including login details, payment card information, address, and travel booking information.
- In 2019, after Apple introduced its credit card to consumers, allegations of an algorithm with gender bias emerged. Several prominent tech executives (including Steve Wozniak, the famous technologist and cofounder of Apple) described receiving exponentially higher credit limits than their wives, with whom they shared assets. Besides gender, no clear factors could suggest such a difference.
- In March 2021, the privacy of over 533 million Facebook users was compromised when their data was posted on an open hackers’ forum. It was one of the largest data breaches of all time. The incident raised concerns about how organizations store and secure personal information and whether they should be allowed access to such data in the first place.
These examples are a small snapshot of the breaches in digital security. Even DataCamp suffered a security breach in 2019.
Regardless of how low-tech you may be, the importance of data ethics cannot be overstated. It's impossible to ignore the fact that we're living in a world that increasingly depends on information, and consequently, data ethics will become an essential factor for everyone.
Data is a valuable asset that can build the next great business or innovation to improve the lives of countless people. However, it is also a resource that many organizations are not protecting or using ethically.
That's why understanding data ethics is so essential. It helps us discern what we're doing with our own information and how we can protect ourselves from its misuse without sacrificing our ability to participate in society. It also helps us think about how we want society to function in the future, not just in our lifetimes but for future generations as technology continues to evolve.
This is also why regulation related to technology is not just about protecting data privacy or security; it's about protecting human beings from the unknown consequences of their own actions.
Understanding the Current & Potential Future Data Ethics Landscape
It's essential to remember that AI and machine learning are still relatively new technologies—they're not going away any time soon, but neither are they in their final form now. This is why there is urgency regarding questions about how these technologies should be used to ensure that they remain a benefit to society. Learn more about issues around the future of responsible AI with our DataFramed podcast.
Since we live our lives increasingly online, the line between private and public has been blurred. And while there are some legal protections for your data (such as the European Union's General Data Protection Regulation, or California’s CCPA), there are also enough gray areas to make it difficult to manage your own privacy.
Research about popular mobile apps and social websites reveals that many of them sell our personal information to third parties.
[Reference: Statista 2021]
And should you frequently use some of these platforms, you might be surprised about exactly what data is being sold. From your email to your chat and location history, advertising agencies or other buyers can gain a lot when you unwittingly let them into your life through applications.
[Reference: Harvard Business Review]
Many organizations are developing the infrastructure to make AI and data science possible in various industries, meaning that the research on exposure is only just beginning. Consent and the sale of personal information will become an important topic in more and more parts of life.
To understand the potential for impact once certain companies develop the infrastructure to facilitate big data projects, open-source AI art tools such as DALL-E and Stable Diffusion demonstrate how quickly AI can improve once made widely available. With these rapid developments, our only assurance for the future is that data privacy, security, and a sense of fairness will become more and more challenging to guarantee.
Principles of Big Data Ethics
While there is no standard method for approaching data projects—big or otherwise—the essential aim is to translate basic human rights into the digital age. Regardless of the terms used, considering potential harm to people, organizations, and systems is the motivation behind these principles.
Governmental risk agencies have provided some guidance on practical application. From NIST, the US technology standards organization, and DataEthics.eu comes this approach to considering challenges unique to your industry:
- Harms to people:
- Adverse impact on an individual’s civil liberties or physical safety, the discrimination of groups of people, or the repression of democratic participation at scale.
- Harms to organizations:
- Adverse impact on business operations (monetary loss, security breaches, and reputational harm)
- Harms to systems:
- Large-scale harms, such as to financial systems.
Some companies have also included the following values which speak directly to their offerings: good intent, data-driven decision-making, and user-driven design. Otherwise, when developing a data project, the following four data ethics principles are a clear, actionable basis for development:
Transparency refers to clear communication of what data will be gathered, whether it will be stored, and who it might be shared with. While terms and conditions documents can be long, convoluted, and almost require legal expertise, users should understand how the company will use their information. You can find some examples of how this is incorporated at the OECD.
Accountability refers to an organization taking responsibility for what happens to an individual’s data after collecting it. From data leaks to sales to a third party, companies need to accept accountability for the damage their processes can cause users. You can find examples of how companies such as Coloplast and Novo Nordisk apply this.
Agency refers to one’s ability to make meaningful choices about personal data. Whether it’s the ability to access, remove themselves altogether or choose what collected information should be stored, allowing individuals to control what happens to their information should be a human right.
Privacy refers to one’s ability to expect protection from public exposure of their data. If you consent to an organization to collect your data, you have the reasonable expectation of continued storage or use based on your initial agreement.
Personally identifiable information (PII) refers to any information linked to a person’s identity, including the VIN number of a vehicle, voice signature, or telephone number. However, hackers and ethical technologists have been able to identify individuals with much less personal detail.
Applying Data Ethics
Aside from contemplating general philosophy, individual practitioners and organizational leaders face challenges integrating these values into everyday practices and processes.
Data Ethics Guidance for Individual Practitioners
Software, Machine Learning, or AI Engineers
The skills needed to develop software are not the same as those that determine whether a project is ethical. This is evidenced by many examples of highly biased and unethical algorithms and software products defended by technical founders.
While software engineers can (and should) broaden their understanding of common ethical pitfalls, the key application for individual technical contributors is to collaborate with ethicists on challenging questions. Understanding that additional time and effort to work through ethical analyses of a product is not wasted, nor a delay of ‘innovation,’ is critical to accepting the new normal of this role.
Documentation and careful data review are best practices that will be even more critical as transparency, accountability, and accessibility are incorporated into your products. While many ethical questions fall outside the scope of development, unforeseen concerns may arise in this phase, so developers should take an active part.
Technical Project Managers
As a project manager, navigating additional requirements related to ethical safeguards will be a new challenge. Estimating time and effort related to these unprecedented tasks and developing a clear understanding of new stakeholders (researchers, ethicists, etc.) will be critical in creating meaningful outcomes for your technical team.
Managing requirements and navigating scope and demands from all sides already make project management a demanding job, but standing firm in aligning a product with ethical principles will make your role even more critical.
Working with developers to create meaningful documentation of any attempts to detect bias and discrimination will be essential to iterating internal protocols around fair and ethical initiatives.
As the product manager, you may have the most hands-on role related to whether an idea passes an ethics test. Navigating requests and requirements will force you to work closely with leaders in delivery and development teams to ensure that features have been thoroughly user tested and reviewed for inclusivity. If your organization does not already have standard measures for receiving feedback, develop processes to iterate if ethical violations emerge.
Data Ethics Guidance for Organizational Leaders
For startups developing technology and tools which include AI applications, mindfulness about ethical implications is essential. It can be challenging for small companies to dedicate additional resources to ethical analysis, but it’s even more important that these companies develop these practices early on.
As many small businesses scale, they may begin to use more technology. Gathering user data, developing purchasing databases, and potentially hiring engineers may all become the norm. However, with these developments comes an exponential growth in related risk.
Leaders in this type of business can prioritize transparency and accessibility as they grow. Keeping customers informed about what data they’ve gathered and how it will be used is key as you continue to develop your processes. Requesting consent and maintaining high levels of communication regarding changes in technology and expectations is another way SMEs can ensure they are growing sustainably.
Due to the scale of impact for most multinational corporations, developing reliable security, privacy, and ethics processes are even more critical. Corporations in vastly different industries have fallen victim to ransomware, leaks, and improper use of data by employees; however, claiming ignorance of the risks is no longer acceptable.
Developing internal, cross-departmental working groups is a valuable first step in developing a company-wide shift in values and knowledge-sharing. Expanding required training for employees and incentivizing regular analysis for managers and individual contributors will also communicate the value of ethical considerations related to data for all.
Today’s Challenges: Data Ethics Examples
Algorithms have the potential to amplify the impact of any action a million times over as they’re implemented. Even in the infancy of their capability, there are many examples of how a lack of ethical safeguards for algorithms affects us.
Hiring Algorithms at Amazon
Projects related to human resources are almost always a high risk because they determine access to employment, and impacts can often be a clear threat to people's livelihoods. However, with online applications and tools like LinkedIn, finding the right candidate while ensuring equal opportunity presents new challenges.
In 2016-2017, Amazon attempted to address both of these problems with an internal hiring algorithm that would decide which resumes made it to the next hiring stage. It wasn’t long before bias against women in technical roles became obvious. Due to existing bias in the historical data, the algorithm had taught itself to identify and single out women without prejudice.
While Amazon may have scrapped this algorithm, most HR representatives anticipate the regular use of AI tools in their hiring processes in the future. Read more about how to ethically use machine learning to drive decisions in a separate article.
Facial Recognition Struggles to Detect Darker Skin Tones
Facial recognition has gained widespread use not only related to accessing phones and accounts, but as a critical tool for law enforcement surveillance, airport passenger screening, and employment and housing decisions.
However, in 2018, research revealed that these algorithms performed notably worse on dark-skinned females—with error rates up to 34% higher than for lighter-skinned males.
If you’re curious about how face detection works, check out our tutorial on face detection with Python using OpenCV.
Learn more about data privacy and anonymization in Python with our interactive course.
Period Tracking Apps
Since federal abortion protection, Roe v. Wade, was overturned in the US earlier this year, period tracking apps have become a possible means for prosecution in any future outright abortion bans. While most apps have terms that allow them to share personal data with third parties, many period tracking apps have murky language around whether data could be shared with authorities for arrest warrants.
While you might need a legal expert to interpret the terms and conditions of all the applications you use daily, understanding the trade-offs and risks you expose yourself to will only become more important.
Uber Tracked Politicians and Celebrities with “God-Mode”
In 2014, reports claimed that Uber’s “god-mode” gave all employees access to complete user data, which was often used to track celebrities, politicians, and even personal connections (such as ex-partners). A former employee also filed a suit describing reckless behavior with user behavior and a “total disregard” for privacy.
Around the same time, in 2016, Uber attempted to cover up a major data breach by paying off hackers and not reporting the incident. Despite a pending investigation into company security by the US Federal Trade Commission at the same time.
From inherent security flaws to hacking vulnerabilities, data security can put more than your login info at risk. Apps and websites of all kinds can pinpoint your location, read your messages, see who you’ve blocked, and how you spend your money.
Whether you’re a social butterfly online or just participate in minimal digital services (banking, texting, email, etc.), you’re most likely exposed in more ways than you know.
Current Data Ethics Regulations
Many people are aware of the importance of data ethics—but what does it mean to us as consumers? What does it mean for companies? What does it mean for government agencies?
The answer to these questions depends on where you live. In some countries, there are already laws in place requiring companies to be transparent about how they handle personal information; in others, there aren't any regulations at all.
And then there are the ethical questions we face every day: How much do we share with each other online? Should we be sharing less? And what happens when something goes wrong? These are all pressing issues that require thoughtful consideration and strong leadership from both government and industry leaders so that our future isn't threatened by unethical behavior or lack thereof.
Around the world, governments are trying new methods of regulating ethical data collection, AI, and various uses for data (ex: targeted marketing, etc.).
Here are a few regulations worth knowing and following to understand the current state of data regulation where you live:
- GDPR (European Union)
- CCPR, CCPA (USA - California)
- EU AI Act
- New York Bias in Hiring AI Regulation (USA - NY)
- Algorithm Justice and Online Platform Transparency Act (USA)
- Digital Services Act (European Union)
While the EU and US lead the way in ratified legislation, most of these policies are new and continuously under development.
Continued Data Ethics Education
For those interested in preparing their career for these developing changes in technology and business, there are many which can either facilitate a change or entryway into a career in ethical data. Start with our Introduction to Data course, which looks at data ethics and data privacy. From there, you'll find several certifications to explore:
- Data Privacy
- IAPP Certified Information Privacy Manager (CIPM) certification
- Securiti PrivacyOps certification
- ISACA Certified Data Privacy Solutions Engineer (CDPSE) certification
- IAPP Certified Information Privacy Professional (CIPP) certification
- Data Security
- CompTIA Security+
- Certified Ethical Hacker (CEH)
- GIAC Security Essentials Certification (GSEC)
- Offensive Security Certified Professional (OSCP)
How to Write a Data Analyst Job Description
Why Hire a DataCamp Certified Candidate?
Introducing The State of Data Literacy Report 2023
Why is Data Literacy Important? The Top 10 Data Literacy Stats for 2023
Building Trust in Data with Data Governance
Laurent Dresse joins the show to discuss how data leaders can succeed in their data governance journeys.