François Chollet is an AI & deep learning researcher, author of Keras, a leading deep learning framework for Python, and has a new book out, Deep Learning with Python. To coincide with the release of this book, I had the pleasure of interviewing François via e-mail. Feel free to reach out to us at @fchollet and @hugobowne.
François, you’re a software engineer and artificial intelligence researcher at Google. I want to find out more about what you do. But first, what people actually do and popular impressions of what they do infamously diverge. What do people think you do?
That's a sharp observation, there's usually a discrepancy between what people are known for, what they would like to be known for, and what they're actually working on. In my case, I'm probably best known for creating Keras, the deep learning framework.
What do you actually do?
I work on the Brain team at Google in Mountain View, where I spend most of my time developing Keras. So I believe there's a pretty good alignment between what I do and what people think I do.
I also contribute to TensorFlow, Google's machine learning framework, which Keras integrates with. On the side, I do research on a range of topics. Most recently I've worked on papers on machine translation, computer vision, and applying deep learning to theorem proving. My main research interest is to understand the problem of abstraction and reasoning in AI -- how to go from perception to abstract, highly-generalizable models.
You are well-known as the author of the Keras package, an open source neural network for deep learning in Python: what is deep learning?
Deep learning is a specific approach to machine learning, that has turned out to be a lot more powerful and flexible than previous approaches. In most applications, what we call "Deep learning" can be thought of as a way to turn lots of data, annotated by humans, into a piece of software that can automatically annotate new data in a way that's similar to what humans would do. You can automate a lot of different tasks that way. Deep learning is especially good at making sense of "perceptual" data, like images, videos, or sound.
Here's a concrete example. Consider a large collection of pictures, with some tags associated with each picture ("dog", "cat", etc). Deep learning allows you to automatically turn your data into a system that "understands" how to map the pictures to the tags, learning only from examples, without requiring any manual tweaking or custom engineering. Such a system can then be applied to new data, effectively automating the task of tagging pictures.
In the same way, you can apply deep learning to a wide range of problems, such as machine translation, speech recognition, text-to-speech, optical character recognition, etc.
Congratulations on your new book, Deep Learning with Python. Why did you write this book?
This book is my attempt to come up with a curriculum to teach deep learning to someone with Python coding abilities but no prior machine learning background. I'm trying to make deep learning as accessible as possible, without dumbing down anything. It turns out to be possible, because for the most part there are no difficult ideas in deep learning.
Python is arguably the fastest growing programming language, at least in high-income countries. Why Python, both for you and the wider programming community?
I love Python. It's easy to pick up and it keeps getting more and more productive as you learn to use it, no matter how long you've been at it. It feels very intuitive and elegant compared to most other languages I've used. But the real killer feature of Python is not in the language itself, it's the surrounding ecosystem and community. Whatever you need to do -- parse a specific file format, interface with a specific system -- there's almost certainly a Python library that does it, so you don't have to spend time implementing it. This is especially true when it comes to data science and machine learning, there are lots of great tools -- numpy, pandas, scikit-learn, plotting libraries, etc. That makes Python a very productive language.
Also, I like that Python is not a domain-specific language, rather it sits at the intersection of multiple domains, from web development to data science to system administration. That means you don't have to switch to a new language to deploy your Keras models as a web API, for instance. Whatever you need to do -- launch a webapp, query a REST API, parse some files, train state-of-the-art deep learning models, Python is generally a pretty solid choice.
There is a perceived barrier to entry for people who want to enter the disciplines of machine learning and developing artificial intelligence. What are your views on the democratization of the requisite skills and techniques?
I don't think that's true. Getting into machine learning has become extremely easy over the past 5 years. Sure, 5-7 years ago it was tough. You probably needed a graduate education. You needed to write a lot of low-level algorithms on your own, typically in C++ or Matlab. I went through that. Nowadays it's different. You just need Python, which is a lot easier to pick up, and you have access to high-level and easy-to-use tools like Keras. Plus, you can learn from lots of very high-quality resources online, and you can practice on real-world problems on Kaggle. Learning has never been easier.
So at this point, you can just, say, grab my book, install Keras, do some Kaggle challenges, and after a few months you will have become fairly productive with machine learning and deep learning on practical problems.
How do Keras and your new book play into this philosophy?
When I released Keras initially, I wasn't particularly trying to democratize deep learning. But as time went by, I saw countless people picking up deep learning via Keras and using it to solve a large variety of problems in ways I didn't expect, and that has been really fascinating me. I've realized that deep learning can be deployed, in transformative ways, to far more domains than what people in Silicon Valley suspect. There are just so many people out there that could benefit from using deep learning in their work. As a result, I've grown to care a lot about making these technologies available to as many people as possible, and that has become the number one design goal of Keras. That's the only way we're going to deploy AI to the full extent of its potential -- by making it broadly available.
My book is an attempt to take another step in the same direction: I'm trying to properly onboard as many people as possible into deep learning, so they can start using it to solve the problems they're familiar with and that I don't even suspect exist. It's not enough to just provide easy-to-use tools, you should also provide learning material to teach people how to use the tools.
What are the most important things for beginners to learn? How can they do this?
The most important thing is probably to get a sense of what deep learning can and cannot do. That, and to get a feel for key best practices such as how to properly evaluate models and how to fight overfitting. This requires a combination of formal explanations and extensive practice on real-world problems.
Artificial intelligence is a term that, for a lot of people, brings to mind sentient robots. We see headlines such as 'Google AI creates its own 'child' AI that's more advanced than systems built by humans'. I need your help to demystify what AI actually is. What is artificial intelligence capable of?
There's a lot of hype in the field, for sure. Most of the media coverage of AI and deep learning is dramatically disconnected from reality -- both the scare stories and the tales about how AI is going to make everything wonderful.
As for what AI can do today, that's a difficult question. I would say there are three categories of things AI can do:
- Do tasks where we are able to fully, explicitly specify the rules that the AI needs to follow. This is essentially what is known as "symbolic AI", or more pragmatically, "software development". As anyone who has done any programming knows, this approach is brittle and only works in setups where everyone is under control -- which is rarely the case in real-world problems.
- Do simple perception and intuition tasks where we aren't able to explicitly specify rules, but we are able to provide many examples of the task. This includes all of deep learning: classifying pictures, transcribing speech, etc. One important limitation of our capabilities on that front is that our models can only handle inputs that are extremely close to what they've seen before -- you can't get very far from your training data. What we're doing here is basically glorified high-dimensional curve fitting.
- Fairly naive combinations of the above. For instance, you could imagine a robot with a deep learning module that can extract the type and position of a number of objects in its surroundings, trained on many examples, coupled with a module that hardcodes high-level rules for manipulating these objects. Another example would be AlphaGo/AlphaZero, which is basically a combination of brute force search (explicit programming) with a deep learning module trained on lots of games that can "intuitively" evaluate the value of a board position.
As it happens, when leveraging current techniques to the full extent of their potential, you can get superhuman performance on many important tasks, and decent performance in many more. But that's only possible in very narrow settings. And, maybe counter-intuitively, there is no real path from being very good at many different vertical tasks, and having the general intelligence and common sense of even a toddler -- much less its learning and adaptation capabilities.
That said, being really good at lots of very narrow tasks is transformational for most industries. So you can expect AI to deliver a lot of economic punch over the next 20 years. You can think of AI as the steam engine of our era -- a very powerful tool in the human hand, that will reshape the economic landscape over the course of a few decades. But no sentient robots in sight.
What isn't artificial intelligence capable of?
Out of the space of all things we might want to be able to automate, AI today can only handle a very small subset. There are far more things we can't do than things we can do.
In general, we are especially bad at:
- Anything that requires "grounding" or "understanding". For instance, AI can't understand the meaning of natural language, instead it treats language in terms of statistical dependencies or hard-coded processing rules. "Meaning", as it exists in the human mind, derives from embodied human experience, which our AI models don't have access to. For now at least. So no AI system today can "understand" its task in a way that would make sense to a human. Models merely map out the statistical manifold of their training data.
- Anything that involves dealing with data that's different from what the AI has seen before. AI can only apply rules you've coded explicitly, or recognize things that are very, very close to what it was trained on. Our capabilities decay exponentially the more uncertainty or variation you introduce in a task.
- Anything that involves reasoning and abstraction. Either we can hardcode explicit reasoning rules into the machine, or we can't perform reasoning at all. Current AI cannot figure out on its own abstract models of a situation. Arguably this is the main bottleneck to AI development today. If you solved it you would quickly be able to overcome the previous two.
I wrote a blog post called The limitations of deep learning about this and related issues.
What are the major challenges facing the deep learning community?
Fighting hype, developing ethical awareness, and gaining scientific rigorousness.
Hype: this is plaguing our field. Some people out there are hyping up recent progress in ridiculous ways, vastly overselling our current capabilities, and often casting human-level as being around the corner -- which it isn't. If we set sky-high expectations and then fail to deliver on them, we are turning people against us. And besides, it's just intellectually dishonest, and toxic for the public debate.
Ethics: most people deploying AI systems today don't come from a particularly diverse background, and are often blissfully unaware of the ethical implications and harmful side-effects of the systems they build. This is a major problem because these people are going to have an increasing amount of power over others. We need to discuss these issues more and raise awareness of possible unethical applications of AI, whether it's biased predictive models impacting people's lives, AI being applied in really questionable places, or AI being used to manipulate our behavior and opinions in dangerous ways.
Science: there are tons of deep learning papers getting released every day, and most of them don't really produce any meaningful new knowledge, because they don't follow the scientific method. They "evaluate" models in fuzzy ways, or test overfit models on their training data (this is especially the case for generative models and reinforcement learning, two of the fastest-growing topics in deep learning research), cherry pick results, use artificially weak baselines, tune hyperparameters in a way that results in overfitting to a specific task, evaluate models on MNIST only, etc. Deep learning is a scientific disaster zone. Peer review often doesn't address these issues in any meaningful way, maybe in part because most peer reviewers have generally been in the field for a year or two at most (since the field is growing exponentially). If we want to make faster progress, we need to inject into the field greater expectations of rigorousness, when it comes to research reproducibility, baselines, model evaluation, and statistical significance. Our current incentive system is biased against science, sadly -- we incentivize publishing, and unfortunately it's easier to publish if you make your research sound complex & mysterious while making it impossible to properly evaluate its significance.
What does the future of deep learning look like to you?
I actually wrote a blog post about this. In summary, I expect AI to increasingly blend "intuitive" pattern recognition modules with formal reasoning modules. I also expect AI to evolve to become a more like a form of automated software development, borrowing a lot of patterns and practices found in software engineering today.
You're not only a developer. On the keras blog, in essays, in your book and on twitter, you're thoughtful and vocal about ethical issues surrounding deep learning, AI and tech, in general. Do you think developers have an obligation to engage in ethical considerations around the work that they do?
For sure. I think this has been lacking in the general tech landscape in recent years. Look at Facebook, for instance. Or at most smartphone games. Technology is never neutral -- because it's powerful, because it affects our lives. The way you design products and technology is actively injecting values into them, whether you're aware of it or not. So you might as well be deliberate about it. If you build a social media platform and try to maximize "engagement", well, that's an ethically-charged decision, with important implications. Either we care and we make ethics an explicit design goal of the technology and power structures we build, or we surrender our values. If you're in tech, you can't choose to be "on the sidelines", that's a delusion.
The Ultimate List of Data Science Podcasts to Listen to Right Now
5 More Things Business Leaders Need to Know About Machine Learning
Everything We Know About GPT-4
Stock Market Predictions with LSTM in Python
Demystifying Mathematical Concepts for Deep Learning
Natural Language Processing Tutorial