What We Learned From Teaching 1M People Data ScienceMay 16th, 2017 in R Programming
Learning Data Science By Doing
DataCamp’s goal is to build the best learning platform for data science. Our philosophy is that you learn data science by doing: our high-quality videos, in-browser coding, and gamification provide learners with an engaging learning experience in their browser. At the same time, we want to take a data-driven approach to our product and share insights about learning data science with the world. This post will take a look behind the scenes of DataCamp and will share some insights into the data science learning process.
Qualitative Course Creation & Maintenance
The DataCamp team is always looking for the most exciting topics to build courses, and it’s usually not easy deciding on a topic with so many new advancements and exciting topics out there! That’s why we try to make this process as data-driven as possible: apart from looking at the latest trends in data science topics and programming languages that are the most in demand in the industry, DataCamp also reaches out to its network of instructors, which are industry experts and academics, to get to know their opinion on what should be part of the curriculum next and what such a course would look like.
But, in the end, the learner is central to DataCamp and that’s why we also actively involve our users into the course topic selection process: we conduct interviews with our users and ask them to fill out surveys to discover the needs of our learners. This data is then put together with the experts’ opinion to not only determine the topic of the course, but also what the course will cover: this way, all material that is relevant to the learner will be included in the course.
Additionally, we also gather the terms that users fill out in the search bars of the site. With the help of R, these terms are analyzed, ranked and visualized internally so that DataCamp’s course development team can use it as a starting point to map out the curriculum for the next months.
Check out DataCamp’s full course library here.
Learner Difficulties & Continuous Course Improvements
Before a course goes live, DataCamp first sends courses to a smaller group of users and asks them to take notes and give feedback in the form of a survey with many open-ended questions. This allows us to do some final tweaks to the course before we send word to our entire user base that a new course is launched.
Hints & Solutions
Determining the course topics and getting a small group to beta test courses that are about to be launched is one thing, but when the course is live, DataCamp also pays attention to where the learner experiences difficulties may arise within the course.
Did you know, for example, that whenever you ask for a hint or a solution, this data is gathered by DataCamp to determine how many times users asked for hints or solutions of exercises? This data and the insights that we gather from it are used to improve our courses!
We look at the numbers to see where our courses can be improved: whenever too many hints or solutions are asked, we can easily locate the problem and make sure that the exercises are improved. How these exercises are improved is done with the feedback that we get from our users: for example, one exercise that was troublesome to a lot of users was one in the Kaggle course, where users were expected to build a decision tree with R. Initially, the exercise instructions only had a very generic example of how the decision tree should be built up, which resulted in a high percentage of hints and solutions asked for the exercise. Together with the feedback of the users, the DataCamp team expanded the instruction list and included three additional sub-bullet points to explain how each parameter should be defined. In doing this, the task that was expected of the users was more clear and more practical, which brought the hint rate down to 15%, which is much more reasonable given the difficulty level. Experiences and improvements just like these, of course, help us to build the future courses that will appear in DataCamp’s data science curriculum.
On another note, we see that some of courses, such as “Introduction to R” clearly offer less problems to users than statistics courses like “Intro to Statistics with R: Multiple Regression”, because users ask for less hints and solutions. This is partly because the introductory material is very comprehensive, while the more advanced courses give you even more freedom to discover the material yourself: learning by doing becomes even more apparent when you go through DataCamp’s courses! The course improvement process definitely needs to keep the balance between learning by doing and giving the support that the learner needs.
Besides the hints and solutions, you already briefly read about the personalized feedback feature that DataCamp offers together with its coding challenges of each and every one of its courses. The feedback is built up in such a way that you can first ask hints and, as a last resort, go for the solution. This latter option is discouraged, as there is an effect on the XP that users get when they go for the solution. This forces students to get started on their own and develop an intuition for what needs to be coded to complete the task that the coding challenge poses to them. As an additional benefit, the feedback messages are built up in such a way that users should immediately get an idea of what is wrong with the code, instead of having to look at error messages which might not make sense at first. This is also a way to stimulate our students to develop an understanding of the code, but especially of what they’re doing wrong. This eventually strengthens the data science learning process.
Delivering this personalized feedback was not an easy thing to put into place: when DataCamp started out, the feedback system was still in its infancy, which caused the feedback to be too generic. Over the years, it has become clear that our users come to DataCamp with different backgrounds and different objectives, which demands a personalized approach. As a result, the feedback system has already improved tremendously but also continues to improve thanks to the input that we get from our users: all of the issues that users experience while going through the course is logged and use in the continuous improvement of our courses and the feedback system.
Time Spent on Exercise
Next to tracking the amount of hints and solutions asked per course, Datacamp also tracks how the learners go through the exercises on a more granular level: the amount of time that our students spend on a certain exercise is tracked and analyzed as to discover problems in students’ learning experience on the platform. As DataCamp wants to ensure that learners advance through the content at a healthy rate of advancement, this information is also taken into account to continuously improve the quality of DataCamp’s courses.
Besides tracking the exercise difficulty and the time that users spend on exercises and using that information to improve our course creation process, DataCamp also keep tabs on what courses our users like so that we can make new courses that continue to interest them. Topics that are generally less popular with our users are statistics, data manipulation and data visualization less, while Introductory programming courses, machine learning, and projects/case studies, on the other hand, do well with DataCamp users.
These ratings, in addition to the amount of hints or solutions asked by our users give us a very good idea of what courses might require some extra love.
As a professional or as somebody who’s just starting their career, time comes limited and at a price. That’s why our courses list the prerequisites that learners should ideally fulfill when they’re considering taking a specific course. Take, for example, DataCamp’s Python Data Science Toolbox course:
The prerequisites are clearly listed so that users know what knowledge and skills are expected of them before they start a course.
In addition, DataCamp has split up its courses into smaller chapters and even smaller chunks in which videos and interactive coding challenges with feedback alternate. Such a structure has proved to work for learners when they're learning data science - it’s not only due to the fact that there’s some variety into the curriculum, but also the fact that learners should make progress rapidly and at a healthy rate, which is one of the best ways to keep students motivated and engaged with the content: they are less likely to drop out and more inclined to continue with the courses.
The interface is one of the most important components when teaching data science to data science learners: ever since the beginning, DataCamp has made an interface that makes sure that students can code in the comfort of their own browser, without previously installing RStudio or Python locally. This, together with the videos and the personalized feedback makes the hurdle for beginners a lot smaller and the learning experience a lot more successful.
However, there have been some changes to the interface over the years to make it a lot more interactive than it initially was: in the current interface, learners can more easily navigate through the course, contact support when they find technical issues with the course and it’s more clear how many XP they can earn by completing a certain exercise.
All data science learners and teachers know that it can be a challenge to find a data science (series of) courses that fit your needs in terms of quality, engagement, learning style, … In all of this, it’s easy to lose track of your original goal and learning purpose. More specifically, most learners are professionals that are looking to move into the data science industry and therefore have specific career goals. Or they are already working in the data science industry but are looking for a way to get more proficient in a certain data science topic or skill.
That’s why DataCamp recently created tracks, which offers a structured approach to the DataCamp courses that you might already know: this way, you’ll focus in a structured way on strengthen your data science skills or work towards that career goal that you’ve always been dreaming about.
Tying in with what you read before about our course creation process and the DataCamp users themselves, you see that these tracks also help data science learners to address the data science topics that they find difficult in a more targeted way so that they can quickly improve and tackle the learning curve.
Go and take a look at all DataCamp tracks here.
[Coming Soon] Daily Practice
Tackling the learning curve by making qualitative courses and designing tracks that keeps our users focused is one thing, but learning data science still remains quite a challenge. At DataCamp, we want to make sure that our users can learn every day and, let’s face it, you need to be occupied with data science every day (even if it’s just a little bit!). The only problem that learners often face is that it can be hard to keep up with all the knowledge that you’re building up. That’s why DataCamp offers a daily practice feature to help you to keep tabs on what you already know well and what skills you might want to refresh.
The system is quite simple: when you go into the practice mode, you will get challenges that will help you to practice the material that you have seen in the courses that you have completed. These challenges are actually five exercises in different forms (multiple choice, fill out, …) that will help you to memorize the course content so that you retain what you’ve learned better. When you answer these five challenges correctly, you gain 250 exp. Of course, you get a different series of exercises each day.
DataCamp is reinventing data science learning and assessment, and this is just one of the many ways we combine research in data science education with the latest technologies and data science itself to push the state-of-the-art.
If you’re interested in data science education, you should join us in building the best place to learn data science! Take a look at our job openings.