Everyone Can Learn Data Scholarship
π Background
The second "Everyone Can Learn Data" Scholarship from DataCamp is now open for entries.
The challenges below test your coding skills you gained from beginner courses on either Python, R, or SQL. Pair them with the help of AI and your creative thinking skills and win $5,000 for your future data science studies!
The scholarship is open to secondary and undergraduate students, and other students preparing for graduate-level studies (getting their Bachelor degree). Postgraduate students (PhDs) or graduated students (Master degree) cannot apply.
The challenge consist of two parts, make sure to complete both parts before submitting. Good luck!
π Executive Summary
π¦ Dinosaur Data
- How many different dinosaur names are present in the data? 42.
- Which was the largest dinosaur? Supersaurus, it is 35m long, and it is a herbivore.
- What about missing data in the dataset? There are 5149 total missing values, with the
familycolumn contributing the most. - What dinosaur type has the most occurrences in this dataset? The ornithopod has the most occurances, and the armored dinosaur has the least.
- Did dinosaurs get bigger over time? Yes, they did. 80% of dinosaurs used to be taller than 2.5m, but in the end, 80% of dinosaurs were taller than 10m
- Create an interactive map showing each record?
- Any other insights you found during your analysis?
- You are most likely to find a high concentration of taller dinosaurs in hot places, especially deserts
π₯ Movie data
- How many movies are present in the database? There are 4968 rows
-
- How many rows have missing data? There are 1076 missing rows in the gross or budget columns.
- What would you recommend your manager to do with these rows? Either switch data source to lessen the missing values or use an AI to research the gross/budget of the movies when they receive the data
- How many different certifications or ratings are present in the database? There are 12 different ratings. However, most of them are outdated, so in total there are 6 currently-used ratings
- What are the top five countries in terms of number of movies produced?
- πΊπΈ USA: 3750 movies
- π¬π§ UK: 443 movies
- π«π· France: 153 movies
- π¨π¦ Canada: 123 movies
- π©πͺ Germany: 97 movies
- What is the average duration of English versus French movies? English movies are 1 hour and 51 minutes long, and French movies are 1 hour and 49 minutes long on average, so there is a 2 minute difference - English movies are longer than French movies by 2 minutes on average
- Any other insights you found during your analysis?
-
More movies were created in the 21th century, than in the 20th century. This could be further interpreted as that more people are getting employed in the movie industry as time goes on which means more movies are getting created.
-
The average budget increases over time which means that the economy grew over time too, so there are now less limitations to making crazy movies.
-
1οΈβ£ Part 1 (Python) - Dinosaur data π¦
π Background
You're applying for a summer internship at a national museum for natural history. The museum recently created a database containing all dinosaur records of past field campaigns. Your job is to dive into the fossil records to find some interesting insights, and advise the museum on the quality of the data.
πΎ The data
You have access to a real dataset containing dinosaur records from the Paleobiology Database (source):
| Column name | Description |
|---|---|
| occurence_no | The original occurrence number from the Paleobiology Database. |
| name | The accepted name of the dinosaur (usually the genus name, or the name of the footprint/egg fossil). |
| diet | The main diet (omnivorous, carnivorous, herbivorous). |
| type | The dinosaur type (small theropod, large theropod, sauropod, ornithopod, ceratopsian, armored dinosaur). |
| length_m | The maximum length, from head to tail, in meters. |
| max_ma | The age in which the first fossil records of the dinosaur where found, in million years. |
| min_ma | The age in which the last fossil records of the dinosaur where found, in million years. |
| region | The current region where the fossil record was found. |
| lng | The longitude where the fossil record was found. |
| lat | The latitude where the fossil record was found. |
| class | The taxonomical class of the dinosaur (Saurischia or Ornithischia). |
| family | The taxonomical family of the dinosaur (if known). |
The data was enriched with data from Wikipedia.
1 hidden cell
Data preview:
πͺ Challenge I
Help your colleagues at the museum to gain insights on the fossil record data. Include:
- How many different dinosaur names are present in the data?
- Which was the largest dinosaur? What about missing data in the dataset?
- What dinosaur type has the most occurrences in this dataset? Create a visualization (table, bar chart, or equivalent) to display the number of dinosaurs per type. Use the AI assistant to tweak your visualization (colors, labels, title...).
- Did dinosaurs get bigger over time? Show the relation between the dinosaur length and their age to illustrate this.
- Use the AI assitant to create an interactive map showing each record.
- Any other insights you found during your analysis?
1 hidden cell
πͺͺ How many different dinosaur names are present in the data?
There are 1042 different dinosaur species in the data.
π Which was the largest dinosaur?
β
β