Napoleon-Christos Oikonomou has completed
Manipulating DataFrames with pandas
Start course For Free4 hours
6,300 XP
Loved by learners at thousands of companies
Course Description
In this course, you'll learn how to leverage pandas' extremely powerful data manipulation engine to get the most out of your data. You’ll learn how to drill into the data that really matters by extracting, filtering, and transforming data from DataFrames. The pandas library has many techniques that make this process efficient and intuitive. You will learn how to tidy, rearrange, and restructure your data by pivoting or melting and stacking or unstacking DataFrames. These are all fundamental next steps on the road to becoming a well-rounded data scientist, and you will have the chance to apply all the concepts you learn to real-world datasets.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Extracting and transforming data
FreeIn this chapter, you will learn how to index, slice, filter, and transform DataFrames using a variety of datasets, ranging from 2012 US election data for the state of Pennsylvania to Pittsburgh weather data.
Indexing DataFrames50 xpIndex ordering50 xpPositional and labeled indexing100 xpIndexing and column rearrangement100 xpSlicing DataFrames50 xpSlicing rows100 xpSlicing columns100 xpSubselecting DataFrames with lists100 xpFiltering DataFrames50 xpThresholding data100 xpFiltering columns using other columns100 xpFiltering using NaNs100 xpTransforming DataFrames50 xpUsing apply() to transform a column100 xpUsing .map() with a dictionary100 xpUsing vectorized functions100 xp - 2
Advanced indexing
Having learned the fundamentals of working with DataFrames, you will now move on to more advanced indexing techniques. You will learn about MultiIndexes, or hierarchical indexes, and learn how to interact with and extract data from them.
Index objects and labeled data50 xpIndex values and names50 xpChanging index of a DataFrame100 xpChanging index name labels100 xpBuilding an index, then a DataFrame100 xpHierarchical Indexing50 xpExtracting data with a MultiIndex100 xpSetting & sorting a MultiIndex100 xpUsing .loc[] with nonunique indexes100 xpIndexing multiple levels of a MultiIndex100 xp - 3
Rearranging and reshaping data
Here, you will learn how to reshape your DataFrames using techniques such as pivoting, melting, stacking, and unstacking. These are powerful techniques that allow you to tidy and rearrange your data into the optimal format for data analysis.
Pivoting DataFrames50 xpPivoting and the index50 xpPivoting a single variable100 xpPivoting all variables100 xpStacking & unstacking DataFrames50 xpStacking & unstacking I100 xpStacking & unstacking II100 xpRestoring the index order100 xpMelting DataFrames50 xpAdding names for readability100 xpGoing from wide to long100 xpObtaining key-value pairs with melt()100 xpPivot tables50 xpSetting up a pivot table100 xpUsing other aggregations in pivot tables100 xpUsing margins in pivot tables100 xp - 4
Grouping data
In this chapter, you'll learn how to identify and split DataFrames by groups or categories for further aggregation or analysis. You'll also learn how to transform and filter your data, and how to detect outliers and impute missing values. Knowing how to effectively group data in pandas can be a seriously powerful addition to your data science toolbox.
Categoricals and groupby50 xpAdvantages of categorical data types50 xpGrouping by multiple columns100 xpGrouping by another series100 xpGroupby and aggregation50 xpComputing multiple aggregates of multiple columns100 xpAggregating on index levels/fields100 xpGrouping on a function of the index100 xpGroupby and transformation50 xpDetecting outliers with Z-Scores100 xpFilling missing data (imputation) by group100 xpOther transformations with .apply100 xpGroupby and filtering50 xpGrouping and filtering with .apply()100 xpGrouping and filtering with .filter()100 xpFiltering and grouping with .map()100 xp - 5
Bringing it all together
We’ll bring together everything you have learned in this course while working with data recorded from the Summer Olympic games that goes as far back as 1896! This is a rich dataset that will allow you to fully apply the data manipulation techniques you have learned. You will pivot, unstack, group, slice, and reshape your data as you explore this dataset and uncover some truly fascinating insights.
Case study: Olympic medals50 xpGrouping and aggregating50 xpUsing .value_counts() for ranking100 xpUsing .pivot_table() to count medals by type100 xpUnderstanding the column labels50 xpApplying .drop_duplicates()100 xpFinding possible errors with .groupby()100 xpLocating suspicious data100 xpConstructing alternative country rankings50 xpUsing .nunique() to rank by distinct sports100 xpCounting USA vs. USSR Cold War Olympic Sports100 xpCounting USA vs. USSR Cold War Olympic Medals100 xpReshaping DataFrames for visualization50 xpVisualizing USA Medal Counts by Edition: Line Plot100 xpVisualizing USA Medal Counts by Edition: Area Plot100 xpVisualizing USA Medal Counts by Edition: Area Plot with Ordered Medals100 xpCongratulations!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.Team Anaconda
See MoreData Science Training
Join over 15 million learners and start Manipulating DataFrames with pandas today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.