Lise is a data scientist in a geography lab in France. She is using DataCamp to help her colleagues learn R and to sharpen her own data science skills.
Can you tell us about your background?
I did a PhD. in ecology and I needed to do some stats, programming, and modelling because I had to analyze lots of data on fish abundance. That's when I started to learn about R. Right afterwards, I was hired to do some statistics and data science in a geography lab. That was 6 years ago.
I am a research engineer, so besides developing methodological tools, I try to help people with their data analyses. I work with geomorphologists, who study landscapes—especially riverine ones—but I also work with geographers in a broader sense. Geography can be viewed as an environmental science, but as a social science as well. So when I teach R to people in my lab, most people aren't familiar at all with writing code and have always worked with Excel or "clickable" statistical software, so there is a lot of work to do going over the basics of a programming language and helping them change their workflow.
How do you use DataCamp to teach R?
I discovered recently that I could create my own interactive exercises using DataCamp. I taught for the first time using DataCamp last week. When people start using R it can be a really difficult and intimidating learning process, and playing with exercises on DataCamp really helped. I think that, besides the fun of it, one main reason is that you can point to the specific place in the code you are talking about, so people can focus on the point you are trying to make rather than having to write the whole command and (possibly) being overwhelmed by it. It was great to teach beginners with DataCamp, and I am really happy to think in the future I will be able to use DataCamp every time I want to teach R.
What did your students think about DataCamp?
I had very good feedback. Those who hadn't tried to learn R before thought it was quite fun. Those who had tried to learn R beforehand in a different way liked it too, as a less forbidding, and more rewarding way to get a handle on R.
I think many people start to learn R through statistics commands, and without a learning tool like DataCamp, they have to write (or copy) whole commands focusing on the statistics but not really decomposing how the R command works. When I've taught, people thought it was really great to be able to understand what is behind these long commands, what the R code is actually doing. Although I understand the wish to quickly show them how R is going to be useful in their statistical analyses, I think students who haven't been walked through the basics of the language can end up confused and dispirited—and in the end, unable of running these analyses themselves in the real world.
This is especially true if students can't rely on sample code like what DataCamp's exercises offer, because then they are more likely to experience repeated failures (due to e.g. syntax errors) that don't have much to do with the actual analysis at hand. DataCamp's exercises help increase the "success rate" students have with R, which is, in my opinion, critical to students actually adopting R for their everyday tasks.
DataCamp's exercises help increase the "success rate" students have with R, which is, in my opinion, critical to students actually adopting R for their everyday tasks.
How did you decide on data science?
I have always been interested in biology and natural sciences, and went to an agronomy engineering school where I took a lot of different classes. Some of them were about the environment, evolution, ecosystems, and population dynamics, and implied some modelling (on Matlab at that time) and these were the courses I thought were the most fun and interesting. That's how I ended up doing a Master's in ecology. So when I look backwards, I can see that modelling and playing with data (environmental data, but maybe any data really) was actually what was interesting to me at the beginning. When I began my Ph.D. and had the freedom to set the direction of my project—which was originally supposed to be about modelling fish habitats—I gave it a more "stats and modelling" rather than "ecological" orientation. Afterwards I was hired in a geography lab for my stats and modelling (and R!) skills, and then started working as a data scientist, although I didn't know what to call myself at that time!
What about data science is so exciting?
Well, data science makes me feel powerful! There are basically two reasons for my saying that.
One is the fact that data science is a rapidly evolving area, so that a large part of a data scientist's work, at least as I envision it, is updating his or her methods, tools, and workflows. I really like that my work is evolving with the progress of data science, so that I never get bored.
Also, I like to think that with the growing amount of data available to analyze, using and designing new tools to gain efficiency is a crucial part of the job. And of course, I find learning about and using such tools particularly rewarding—I'm French, so I like to not work too much! Joking aside, and to give you an example, I discovered ggplot and dplyr not so long ago, and it is really helping me work more efficiently in the long run.
The second reason that data science makes me feel powerful is that the results you get with it are really tangible. When you do your own research, it is often hard to see the impact of the research or who it helps; it is hard to see the progress you are making. On the other hand, when you work as a data scientist you get to help other people analyze their data which makes you feel really useful, and you can see the results right away! Of course, I realize that helping scientists is only useful if the results we obtain together are findings of interest for their particular corner of the scientific community, but in the meantime I get to feel some satisfaction from helping them!
What do you like about DataCamp?
Besides the fact that learning (and teaching) with DataCamp is fun, I really like the Premium Course content. It's often content that you can't find anywhere else, and the video courses are really well made. Because I know that I can rely on the quality of these courses, DataCamp has become my go-to site whenever I need to learn more about a particular package or method. And of course, the course catalogue is still growing!
a large part of a data scientist's work, as I envision it, is updating his or her methods, tools, and workflows.
Can you talk more about creating your own courses on DataCamp?
I was very excited when I discovered that anyone could create his or her own courses on DataCamp. With the examples provided it was quite straightforward to create my own exercises. At first I didn't know about the Teach editor, so I started by creating my own exercises locally and sending them to my GitHub repository through git commands. This part of the process was quite a pain because my understanding of git is very basic. At first I was creating exercises very slowly because I had to repeatedly commit the changes and then refresh the DataCamp course each time I needed to test a change.
But when I discovered that the Teach editor existed, it changed my life! I was able to make changes, save them and test them on the spot, without any git pushes, pulls or commits being involved! Being able to build my own course and test it directly in the editor, that was a revolution really.
Would you recommend using DataCamp to other people who need to help teach their colleagues? Why?
Yeah, definitely! It makes R less intimidating. R is difficult, statistics are difficult, so at least we should try and make the exercises fun and rewarding! I found that creating my own set of exercises on DataCamp was really useful for when I gave trainings. And if you lack the time to develop your own exercises, you can still use sample exercises from other DataCamp courses with the Teach editor.
How has DataCamp been useful?
It has helped me introduce the tidyverse into my routines! As I mentioned earlier, tidyverse packages are built according to quite a different, "data-scientific" kind of logic. I needed to be able to get into that logic to really use them in a day-to-day manner. For instance, I had been aware of the existence of ggplot for a while, but when you know how to do things a certain way, it is hard to really change and adopt new workflows, even if they are more efficient in the long run. And teaching myself through DataCamp, I was able to practice these packages in a simple way, and then was able to use them in my day-to-day work. Just being aware that these packages existed wasn't enough—I had to really practice in order to make a change.
What skills have you been able to put to use that you first learned on DataCamp?
I've been learning dplyr from scratch, though now I use it in my everyday work and I show it to anyone coming to me asking for help with their own data. More generally, I use a lot of what I learned from various courses about the tidyverse (especially ggplot2 and tidyr) and also use techniques for development, from the course about functions for instance. I also liked the course about RStudio by Garrett Grolemund, which I thought was quite original. I have been using RStudio for quite a while, but there were some things in that course that I didn't know, and they enriched my use of RStudio.
Can you give an example of a cool project you're working on?
I've been working with Shiny a lot, and I think it quite changed the way I can share my work with people. As I mentioned earlier, some of the people I work with are really not into coding, and although it would be more efficient to teach them how to fish than to give them a fish, I have to find means to help them anyway. Shiny apps have been a great help in doing this, especially since the conglomerate of labs I'm part of have purchased a Shiny Pro Server license, so that I have been able to deploy my apps on the web.
One of them, for instance, is an app that helps people doing wavelet transforms. Some of my colleagues work on environmental time series or spatial series (along rivers, for instance), so they are interested in the ability to analyze them in a scale-related way. Wavelet transforms, as signal processing methods, provide results which are quite descriptive, so it is really helpful to be able to parameterize them through a Shiny app, to zoom in on some details and so on.
I also made an app that that helps people produce graphics. They upload their datasets and then they can generate and parameterize ggplot commands through the Shiny app. Actually, the app provides not only the graphic but also provides the command lines that produce it, so I hope providing this app isn't too contradictory with the fact that I am also trying to teach people how to use R and ggplot2 themselves! Anyway, it can be useful to people who are not used to R and who have no special wish to learn it (and although I find this really unsettling, I have learned to accept and respect their choice!).
What advice do you have on people just starting out?
Not to be afraid to spend some time on it. I usually don't teach R to students, but rather to my colleagues, and in a way they are the most difficult of course attendees because they want to be efficient right afterwards. But however you do it, it takes some time to really be able to do data science in an efficient way. My first advice would be not to be too impatient because the time you invest in learning and practicing your R-skills is never lost!