Before the COVID-19 crisis, we were already acutely aware of the need for a broader conversation around data privacy: look no further than the Snowden revelations, Cambridge Analytica, the New York Times Privacy Project, the General Data Protection Regulation (GDPR) in Europe, and the California Consumer Privacy Act (CCPA). In the age of COVID-19, these issues are far more acute. We also know that governments and businesses exploit crises to consolidate and rearrange power, claiming that citizens need to give up privacy for the sake of security. But is this tradeoff a false dichotomy? And what type of tools are being developed to help us through this crisis?
At our recent DCVirtual webinar week, Katharine Jarmul, Head of Product at Cape Privacy, a platform for secure, privacy-preserving machine learning, discussed all this and more, in conversation with Dr. Hugo Bowne-Anderson, data scientist and educator at DataCamp. Watch the webinar on-demand or listen to the podcast, or read on for some key takeaways.
Are there inherent tradeoffs with data privacy?
Hugo leads with a provocative question from Stuart Russell’s book, Human Compatible: Can AI be helpful if it knows nothing about you or people like you? The answer is probably not. So are we forced to give up our privacy to benefit from AI in our daily life? Many people assume a tradeoff between convenience, utility, and privacy—and between privacy and security.
Governments and businesses sometimes claim that citizens need to give up privacy for the sake of security. This is why so many consumers mistrust AI personal assistants and simply choose not to purchase an Alexa. But do those who purchase an Alexa “deserve” to have their privacy invaded? The bottom line is that it’s unfair to put the burden of proof on someone who wants to preserve their own privacy. This tradeoff is a false dichotomy.
Katharine suggests that the key is to productionize research, which is what attracted her to work at Cape Privacy. There’s amazing depth of research in the data privacy space. She says we can push research forward with real-world use cases and threats that are important to us. This implies good relationships between people in research and people in industry.
A best practice from productionizing research is that learning algorithms can operate on encrypted data using the techniques of secure, multi-party computation so that users can benefit from pooling without compromising individual privacy.
Build privacy expertise into your skillset
Data privacy is a big issue that requires cross-functional collaboration. Everyone who works with data needs to be aware of privacy concerns and feel empowered to learn about and address any privacy threats on the solutions they implement. Many companies have already adopted data privacy, security, or risk teams to protect their product and brand. Data scientists and machine learning experts will continue to have to build these considerations into the systems they build. This involves keeping up with advanced privacy and security techniques.
Understand it’s about consent and meeting expectations
Our data privacy standards have evolved over time. For example, 10 years ago, many were outraged to learn that they were getting targeted ads based on keywords in their email correspondence. But today, many are okay with this as a standard business practice for Google.
We must hold high standards for those in positions of power in government and industry to protect against abuse. These institutions have access to so much data—some of which users consider public, and some of which they consider private. Against what benchmark should we judge those who have access and control over this data?
Data privacy and ethics boils down to consent and meeting the expectations of users, citizens, and consumers on how their data will be used. It requires awareness, understanding, and transparency.
Privacy, ethics, and the need for transparency
Data privacy and ethics go hand in hand. Of course, within ethics research we must recognize imperfect conditions. This requires combining what we know from a theoretical standpoint and creating the best possible production system that we can.
There are a lot of bad practices that need to be put through this lens. How can we make bad practices better given what we know? One good way is to make sure companies are as transparent as possible in their documentation. Terms and conditions are often extremely difficult to read—often intentionally so. The New York Times analyzed 150 privacy policies across companies like Airbnb and Facebook, and some of them were more difficult to read than dense philosophical texts like Kant’s Critique of Pure Reason.
Companies should also seriously consider the best solution for their challenges—not just the most convenient solution. For instance, for the purpose of fraud detection, Stripe is silently monitoring your movements on its customers’ websites. But it’s not really clear in the terms and conditions that they’re doing this, and there are many ways to protect against fraud without collecting users’ navigation movements and browser history.
Preserving relationships of trust
Ross Anderson’s article on Contact Tracing in the Real World tears down a lot of false arguments against the data privacy tradeoffs mentioned at the top of this blog post. People in power continue to make the argument that they can guarantee more security if they have more access to more and more of our private information. Especially today, we need to question the ethics and implications of the continued consolidation of data by powerful institutions.
This means we need to keep them accountable for the way they handle data. If they collect data for a certain purpose, it should be restricted for that use only and deleted afterward. For instance, in the data ethics community, data trust is established where a contract is made to only use a specific dataset for a single purpose—like fighting cancer or mapping the genome—and is forbidden to be used for other purposes. Establishing governance of this system would allow better enforcement of data to be used for the purpose it was intended. And if companies expressly agree to delete certain data, they must be able to verify that it has been deleted. Further, if the data is deleted but used to enrich other information, there must also be clarification on who owns this enriched data.
Immediate applications of AI for COVID-19
AI is not going to be the single cure for the COVID-19 pandemic, but we can use our skills around data and machine learning to contribute in positive ways. This includes reporting on phishing and cybersecurity threats related to the coronavirus, capacity planning for hospitals, and optimizing logistics operations to ensure equitable distribution of supplies.
If you enjoyed these takeaways, make sure to view the full conversation on-demand via podcast or webinar. Also, check out Katharine and Hugo’s conversation from 2018 about data security, data privacy, and the GDPR. You can follow Katharine on Twitter at @kjam and Hugo at @hugobowne. If you’re interested in exploring the power of NLP, we encourage you to take Katharine’s DataCamp course on Introduction to Natural Language Processing in Python and her project, Who’s Tweeting? Trump or Trudeau?