Premium project

Exploring the Evolution of Linux

Find out about the evolution of the Linux operating system by exploring its version control system.

Start Project
Code9 TasksDatabase1,500 XP

Loved by learners at thousands of companies


Project Description

Version control repositories like CVS, Subversion or Git store rich evolution information about a software project. In this project, you'll be challenged to read in, clean up and visualize a real world Git repository dataset of the Linux kernel. With almost 700k commits and thousands of contributors (find out the exact number in this project ;-) ) there are some little data cleaning and wrangling challenges that you'll encounter. But you'll also gain insights about the development activities over the last 13 years. For this Project, you need to be familiar with Pandas `DataFrame`s, especially the `read_csv` and `groupby` functions, as well as working with time series data.

Project Tasks

  1. 1
    Introduction
  2. 2
    Reading in the dataset
  3. 3
    Getting an overview
  4. 4
    Finding the TOP 10 contributors
  5. 5
    Wrangling the data
  6. 6
    Treating wrong timestamps
  7. 7
    Grouping commits per year
  8. 8
    Visualizing the history of Linux
  9. 9
    Conclusion
Technologies
Python
Topics
Importing & Cleaning DataCase Studies
Markus Harrer Headshot

Markus Harrer

Software Development Analyst
Markus Harrer is a software engineer who's passionate about improving the way we do software development. He specialized in analysis of software data to show the underlying problems of the symptoms we face on the surface. Markus shares his thoughts and experiences at @feststelltaste on Twitter and on his blog https://feststelltaste.de.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA