project
Exploring the Evolution of Linux
Included withPremium or Teams
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessProject Description
Version control repositories like CVS, Subversion or Git store rich evolution information about a software project. In this project, you'll be challenged to read in, clean up and visualize a real world Git repository dataset of the Linux kernel. With almost 700k commits and thousands of contributors (find out the exact number in this project ;-) ) there are some little data cleaning and wrangling challenges that you'll encounter. But you'll also gain insights about the development activities over the last 13 years.
For this Project, you need to be familiar with Pandas DataFrame
s,
especially the read_csv
and groupby
functions, as well as working with time series data.
Project Tasks
- 1Introduction
- 2Reading in the dataset
- 3Getting an overview
- 4Finding the TOP 10 contributors
- 5Wrangling the data
- 6Treating wrong timestamps
- 7Grouping commits per year
- 8Visualizing the history of Linux
- 9Conclusion
Technologies
Python
Software Development Analyst