Godefroy studied economics, mathematics, and computer science before deciding to change the course of his career and go into data science. Today, he lives in Paris and works for a company called Flylab where he builds software for autonomous drones. He is interested in machine learning, deep learning, computer vision, and probabilistic models. Godefroy uses DataCamp to stay up to date on the latest tools for data science.
What was your experience with data science before joining DataCamp?
I was really fond of R, but I didn’t know enough about it. Just at the time I found DataCamp, I was starting to get super excited by all the changes in R with Hadley Wickham and the crew around him, so I decided to look for courses that would help me get into what I think is a kind of revolution happening in R, with packages like dplyr, ggplot2, and the whole tidyverse. So I had some knowledge of R, but the “old way.” I wanted to get in touch with the new age of data science and DataCamp was perfect for me because it helped me practice using dplyr, ggplot2, purrr and all the most powerful packages.
What is so exciting about data science?
I have never really been able to choose between the different fields of science that I like. I’ve studied economics, human sciences, and mathematics, but I am also interested in any science. I have read a lot of stuff on biology and physics and stuff like that. So for me, data science was a way to become a detective. For every new case, you have to go into a new field to try to understand how it works, to massage the data until you understand them, to try to acquire all the knowledge of the field without being a specialist. And because I am very curious about very different techniques in science, for me, it is exactly what I wanted. In a sentence, I like data science because I can switch from one area to another very quickly and it is really fun.
Could you talk more about how data scientists are like detectives?
Do you know the show Columbo, the old show from the 80s? Columbo was a private detective, and in every episode he was in a different profession and he had to solve a case. And he did it by investigating the profession. It could be the military, hollywood, whatever. And every time he had to very quickly understand the implicit knowledge in the field. He had to talk to people, understand how they thought, stuff like that. And I feel a bit like that with data science. When you are data scientist, you spend a lot of time, or at least I do, talking with specialists of the field and very often they have a lot of implicit knowledge. Stuff they know, but they don’t know that they know it, in a way. They have a lot of knowledge they have acquired through experience and you have to try to get it from them because they are not always able to articulate it.
For example, in my company, we are doing a bit of IoT (Internet of Things) and are working in the Champagne region with a company that is trying to connect the tanks that hold the alcohol. So we put sensors on these tanks and collected data from them to understand how to improve the process of champagne production. And in order to improve, you need the data. And in order to treat the data, we need to understand how champagne is made. And it is really fun, because if you talk to anyone from the engineers to the growers to any employee in the place, they all have some knowledge. They all know a bit of stuff that they kind of don’t share with everyone. And we have to try extract that by talking to them, by understanding, by also making some comparisons with previous projects we’ve worked on. And it is really fun to do that.
But to do this, your data skills have to be very good, they have to flow. First of all, they have to be good enough so that they don’t restrain you. You have to be able to focus on the detective part, the investigation part. Talking to people, getting knowledge, stuff like that. You don’t want to be frustrated by not knowing how to use the tools, obviously.
What’s really nice with the revolution in R, and Python too, is not only that it doesn’t restrain you anymore—you can do pretty much whatever you want—but it also helps you by structuring your way of thinking. It is actually part of the goal of libraries like dplyr: I think it was designed to help you structure your thoughts by giving you only very few verbs, you know, only select, filter, stuff like that, in order to do what you have to do. To do anything, you only need six or seven verbs used together with pipes to solve any of the technical problems you have to face. And the rest is just about being clever and understanding what is going on in the field you are investigating.
I am reading other books and I am doing other stuff to learn, but to me, DataCamp is really nice because it lets me practice. The most important thing, obviously, is to practice because otherwise you tend to forget all the stuff that you are learning. So I am not only using DataCamp to acquire knowledge, but also to keep it. I am constantly coming back to previous courses that I have already completed. And that’s what is nice about DataCamp: I can go back and look at videos and check exercises that I have done in the past. So I really like DataCamp for that.
I also had some experience using Python, but it was really not the best. And just as I was really starting to feel pressure to find courses on Python, you guys came out with the Python curriculum. The collaboration with Anaconda was just perfect for me. I really like those courses.
You’ve been a DataCamp subscriber for a long time. What keeps you coming back?
To be honest, at some point last year I was not sure I was going to continue with DataCamp. Just when I thought I was going to stop for a few months, you came out with this collaboration with Anaconda which was great because I really needed to improve my knowledge of Python. It came right on time. And very often this is exactly what happens. I was also very happy to see your courses on Bayesian statistics. I’m more of a frequentist, I don’t know much about Bayesian stats. I was starting to get interested in that, I was reading books about the Bayesian side of statistics. And just then, you came out with this great course on Bayesian statistics. It is really funny, most of the time this is what happens with DataCamp—new courses are coming out every week.
What are some of your favorite DataCamp courses?
Oh, well, there are so many. In data visualization, I think you are really doing a great job of showing how to do stuff from different points of view. You can use Matplotlib with Python, or you can use ggplot2 and ggvis in R, and to do it all these different ways gives you really all the perspectives about how to visualize data. That’s really nice.
I really love the course by Hadley and Charlotte Wickham. I was really pleased to have a course from Hadley on the purrr library, which is a really great library, and I discovered it through this course. I am trying to get more and more accustomed with functional programming. I think it is really the future of data science. With distributed computation and big data, functional programming is even more important. So it was the first step into this world, and it got me on track with purrr, which really helped me. I really like this course by Hadley Wickham.
And the last thing I’ll mention that I really love is Statistical Thinking in Python, taught by Justin Bois from Caltech. This course is just amazing, it is really great. You do everything from scratch. It doesn’t suppose you have any knowledge. And even if it was something in a field I already knew really well, it was a really great way to connect the computer science part and the statistics part. In school, I studied statistics but without computers. In France, when you study statistics, it is very abstract. And you sometimes use functions without really knowing how to implement them. After taking the course, I had a kind of realization about a new way of doing things. I actually wrote an email to the instructor asking if he could do another course with DataCamp. I really hope you can push him to do more courses, because he has an incredible way of teaching.
What advice would you give someone just starting out on DataCamp?
Once you have completed a course with DataCamp, you might think it is over, but it’s not. Practice mode is great, because you have exercises not only to get the knowledge, but to assimilate it, to have it in your brain, to have the mechanics of using the good libraries, the good functions, etc. That is great.
What you are teaching on DataCamp are languages. When I was learning Japanese, I understood that to learn a language, you really have to practice, practice, practice. As my Japanese teacher once told me, to study a language is like sports. And we don’t realize that enough: when you try to get good at golf or basketball, you don’t just study the theory, you have to practice first. It is the same with languages: to understand is not enough, you have to practice and to practice a lot. You can study as much as you want to, but if you don’t try an exercise 1000 times, it won’t really get in your brain. Exercises that repeat and repeat the same stuff help you not just to understand, but to assimilate the knowledge. The romans have a great saying for that : "Repetitio est mater studirium"—"knowledge is mother of learning."
How did you first come across DataCamp?
I decided to subscribe because of the fact that you were the only ones giving courses on the new topics, like ggplot or dplyr. It definitely made me want to stay. If I had to give one reason why I love DataCamp, it is that you are really up to date with the latest developments.
How does DataCamp compare other online learning platforms you’ve tried?
I’ve tried them all to be honest, you should see my LinkedIn. I spend maybe a lot of time every night working on MOOCs when I have the time. I’ve tried Udacity, I’ve tried Coursera, I’ve tried french stuff—there is a french MOOC website that was nice—also Qwiklabs, other stuff that I don’t remember. I also like the O’Reilly books and videos, but they’re very different from what you’re doing. All of that is good, but the only one that I am paying for is your courses, which says something.
Even if there are other tiny things that I like on other sites, you are always improving yours. I really like that. DataCamp is always improving. You have new courses, you have improved the online interface a lot, added autocompletion, stuff like that. Most of the time I have an idea of how you could improve, you do it a few weeks afterwards. It is really a positive experience because every time you are improving a lot on the things that I want.
What is the greatest benefit of learning with DataCamp?
The thing that is best is that you are really up-to-date. You are really at the frontier, the state-of-the-art. You have courses on the new libraries, from the best teachers like Hadley Wickham and Justin Bois. And actually, you have even removed courses that I didn’t like as much, like some of the older courses which were not as good. You are constantly improving.
How has DataCamp helped you professionally?
I joined Flylabs this summer, and I discovered DataCamp at least two or three years ago. The data science skills I learned on DataCamp definitely help me with the job I have now. Three days ago, for example, I had to help an intern who was working with Python and scikit-learn. So I just followed the series of courses you’ve done on scikit, so I was able to help him divide his data between the testing set and the training set, I helped him understand the different objects in scikit, the use of the pipeline, and I learned mainly all of this through DataCamp. And it is like this every month. Bayesian statistics, statistics in general, visualization, dplyr—these are all things I’ve learned how to implement with DataCamp and are very useful in my job.
Update from the DataCamp team: Since this story was initially published, Godefroy has been promoted to CTO of Flylab! Congrats, Godefroy!