Premium Project

The GitHub History of the Scala Language

Find the true Scala experts by exploring its development history in Git and GitHub.

Start Project
  • 10 tasks
  • 9,269 participants
  • 1,500 XP

Project Description

Open source projects contain entire development histories

  • who made changes, the changes themselves, and code reviews. In this project, you'll be challenged to read in, clean up, and visualize the real-world project repository of Scala that spans data from a version control system (Git) as well as a project hosting site (GitHub). With almost 30,000 commits and a history spanning over ten years, Scala is a mature language. You will find out who has had the most influence on its development and who are the experts.

The dataset includes the project history of Scala retrieved from Git and GitHub as a set of CSV files.

The skills required to complete this Project are covered in Data Manipulation with pandas and Merging DataFrames with pandas.

Project Tasks

  • 1Scala's real-world project repository data
  • 2Preparing and cleaning the data
  • 3Merging the DataFrames
  • 4Is the project still actively maintained?
  • 5Is there camaraderie in the project?
  • 6What files were changed in the last ten pull requests?
  • 7Who made the most pull requests to a given file?
  • 8Who made the last ten pull requests on a given file?
  • 9The pull requests of two special developers
  • 10Visualizing the contributions of each developer
Anita Sarma

Associate Professor at Oregon State University

Anita Sarma joined Oregon State University in September 2015 where she is an associate professor in the School of Electrical Engineering and Computer Science. She holds a Ph.D. degree in information and computer science from the University of California, Irvine. Her research interests are at the intersection of software engineering and computer-supported cooperative work.

See More


  • Python LogoPython
  • Topics

    Data ManipulationData VisualizationImporting & Cleaning Data