Introduction to Python
👋 Welcome to your new workspace! Here, you can experiment with the data you used in Introduction to Python and practice your newly learned skills with some challenges. You can find out more about DataCamp Workspace here.
On average, we expect users to take approximately 15 minutes to complete the content in this workspace. However, you are free to experiment and practice in it as long as you would like!
1. Get Started
Below is a code cell. It is used to execute Python code. There is already pre-written Python code to get you started that imports some packages, loads a file, and creates a visualization of the soccer data you used in the final exercise of Introduction to Python.
Here you can investigate the shooting and defending ratings of different soccer players, broken down by their position. You can hover over the data to learn more information about each player. You can even focus on different positions by clicking on the labels in the legend to the right!
Play around and dig into the data, and when you are ready, dive into the challenges below!
# Import packages
import pandas as pd
import plotly.express as px
# Read in the soccer data
soccer_df = pd.read_csv("datasets/soccer.csv")
# Rename positions
soccer_df["position"] = soccer_df["position"].replace(
{"M": "Midfielder", "A": "Attacker", "D": "Defender"}
)
# Create the scatter plot
fig = px.scatter(
soccer_df[soccer_df["position"] != "GK"], # The data to use, excluduing goalkeepers
x="defending", # The column to put on the x-axis
y="shooting", # The column to put on the y-axis
color="position", # The column to color points by
hover_data=["name", "foot", "height"], # Additional data to show on hover
)
# Update the labels and theme of the plot
fig.update_layout(
template="plotly_dark", # The theme to use
title="Shooting and Defending Scores for Soccer Players<br><sup>Broken Down by Position</sup>", # The title
title_x=0.5, # Center the title
xaxis_title="Defending", # x-axis label
yaxis_title="Shooting", # y-axis label
legend_title="Position", # Title for the legend
)
# Show the plot
fig.show()
Nicely done! Feel free to customize the plot if you feel ready! For example, try out different plot themes by updating the template
parameter from "plotly_dark" to one of: "ggplot2", "seaborn", "simple_white", "plotly", "plotly_white", "presentation", "xgridoff", "ygridoff", or "gridon"!
Note: If you update the plot, make sure to re-run the code cell by clicking inside it to select it and then clicking "Run" or the ► icon.
2. Load in the Baseball Data
Now that you have seen what Python and Workspace can do, it's time to start working with the data yourself! The code below uses pandas, a package introduced in Intermediate Python, to help load in baseball data you used in Introduction to Python as NumPy arrays.
🏃To execute the code, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically navigate to the next cell.
# Read in packages
import numpy as np
import math
import pandas as pd
# Read in the file
baseball_df = pd.read_csv("datasets/baseball.csv")
# Separate into arrays
baseball_names = baseball_df["Name"].to_numpy()
baseball_heights = baseball_df["Height"].to_numpy()
baseball_weights = baseball_df["Weight"].to_numpy()
baseball_ages = baseball_df["Age"].to_numpy()
# Print out first array
baseball_names
3. Challenge Yourself
After running the cell above, you have created four NumPy arrays: baseball_names
, baseball_heights
, baseball_weights
, and baseball_ages
.
Add code to the code cells below to try one (or more) of the following challenges:
- Print out the names of the first ten baseball players in
baseball_names
. If you're stuck, try reviewing this video.
# 1. Print out the names of the first ten baseball players
baseball_names[:10]
- What is the median weight of all baseball players in
baseball_weights
? If you're stuck, try reviewing this video.
# 2. Print out the median weight of all baseball players
np.median(baseball_weights)
- Print out the names of all players with a height greater than 80 (heights are in inches) using
baseball_names
andbaseball_heights
. If you're stuck, try reviewing this video.
# 3. Print out the names of all players with a height greater than 80
baseball_names[baseball_heights >= 80]
Be sure to check out the Answer Key at the end to see one way to solve each problem. Did you try something similar?
4. Next Steps
Feeling confident about your skills? Continue on to Intermediate Python! This course will introduce you to some powerful libraries for visualizing and working with data: Matplotlib and pandas!
If you're still keen to practice, you can also use the code below to load in the soccer data shown in the visualization above.
# Read in the file
soccer_df = pd.read_csv("datasets/soccer.csv")
# Separate into Numpy arrays
soccer_names = soccer_df["name"].to_numpy()
soccer_heights = soccer_df["height"].to_numpy()
soccer_positions = soccer_df["position"].to_numpy()
soccer_foot = soccer_df["foot"].to_numpy()
soccer_shooting = soccer_df["shooting"].to_numpy()
soccer_defending = soccer_df["defending"].to_numpy()
# Print the first array
print(soccer_names)