Skip to main content

Using Analytics To Drive Content Quality

How does DataCamp use data-driven product development to ensure the content of their data science courses?

At DataCamp, our mission is to help individuals and companies become data fluent by building the smartest data science education platform out there. In our quest to achieve this mission, we use data science in a big way to drive product development. This blog post will outline how our team do data-driven product development at DataCamp.

What is Data-Driven Product Development?

Data-Driven product development is all about driving product decisions using insights derived from data. The key steps in this process are:

  1. Collect Data
  2. Analyze Data
  3. Develop Insights
  4. Decide Actions
  5. Test Ideas
  6. Integrate into Product
  7. Monitor Metrics

If you have used DataCamp, you would know that our mantra is “Learning by Doing”. Interactive coding exercises constitute an important portion of a user’s learning experience, and so we spend a lot of time trying to improve and optimize this experience. Below, you can see a flow diagram that sketches the process a user takes while attempting an interactive exercise.

Figure 1 | Exercise Flow

Users start an exercise by reading the assignment text. Subsequently, they read the instructions, read the sample code, and start writing code to complete the exercise. They run their code and submit it to check for correctness.

Figure 2 | Submission Correctness Tests (SCTs)

If they fail in their attempt, they are provided feedback by the submission correctness tests (SCTs). They read the SCT feedback and edit their code to complete the exercise. If they don’t succeed in solving the exercise with the help of SCT feedback, they ask for hints. If the hints don’t help, they ask for the solution.

Given this flow, one of the key questions our team wanted to understand was what drove engagement (completion rates) and content quality (hint usage, solution usage). An extensive analysis of the data revealed some interesting insights:

  1. User: A user’s preparedness for the course as measured by the percentage of prerequisites completed played a significant role in engagement. As one would expected, users who were better prepared, were more likely to complete exercises without asking for hints or solution.

  2. Content: The biggest driver of engagement and quality was the length of instructions. Every 100 characters of instructions, resulted in a 3.7 percentage points reduction in completion rates. Essentially, exercises with longer instructions often required the user to solve multiple problems and write more code, thereby increasing the scope for making mistakes, and getting frustrated. It also increased the loop between a learner trying out something and getting formative feedback, that affects the learning experience.


Armed with these insights, we set out to make product decisions that would help improve the learning experience.

Create Content Guidelines

First, we wanted a scalable mechanism to ensure that all our content satisfies some rules (e.g. length of instructions) that would ensure a consistently good learning experience. So we developed detailed guidelines around content, like number of instructions, length of instructions, length of assignment text etc, that are communicated clearly to instructors and automatically enforced by the Teach App.

Figure 3 | Content Guidelines for Courses (Illustrative)

You can read more about our content guidelines here.

Develop New Exercise Types

Sometimes it becomes necessary to make users solve multiple problems in a single exercise to ensure learning. In order to support this, we developed two new exercise types TabExercise and BulletExercise, that allow instructors to break down a large problem into multiple steps, letting the user tackle one step at a time.

Figure 4 | Bullet Exercise

This keeps the cognitive load low by letting the user focus on one thing at a time, and allows them to get feedback after each step, reducing the complexity associated with exercises with long instructions.

Initial data shows that these exercise types lower the percentage of users requesting a hint or a solution by 25 - 50%, have better ratings, and provide an improved learning experience.

You can read more about these new exercise types in this blog post.

Expose Content Quality Metrics

Finally, given the importance of these content quality metrics, we wanted to expose it to our instructors so that they are constantly aware of how their course is performing and what exercises require attention. We expose these metrics directly through our Teach App as well as a standalone dashboard, that allows instructors to understand the performance of their course at a very granular level.

Figure 3 | Content Quality Metrics in the Teach App


A data-driven approach to product development is key to driving changes that move the needle in a positive direction. It has helped us make important changes that have improved user experience and learning. A key element of a data-driven approach is to quickly iterate through a cycle of analysis, action, testing, and rollout. We have recently started putting in place systems to help us conduct product and content tests to take advantage of this feedback loop. We will write about our experience with testing in a later blog post.

If you want yourself or your organization to benefit from data-driven development and decision making, you can sign up for DataCamp and take advantage of the extensive set of courses and projects that we have. If you are interested in sharing your expertise and building a course for us, please visit us here to get started.