Loved by learners at thousands of companies
Joining two or more datasets is necessary for almost any real-world analysis. You’ve done it before with spreadsheets using VLOOKUP and related functions. Can you build on this experience as you transition to the world of Python? Yes! In this course you will learn the ins and outs of bringing datasets together with pandas, Python’s gold standard for manipulating tabular data. You’ll apply pandas functions to combine data from the National Football League (NFL) framed in a familiar spreadsheet environment. Armed with these skills you will be able to harness the power of pandas and integrate larger, more complex datasets into any analysis.
Introduction to joining dataFree
In this chapter, we'll build a foundation for using pandas to join data. You'll learn about the types of joins and how pandas can improve your effectiveness and productivity.Joining data: a real-world necessity50 xpThe need for joining data50 xpWorking with split data100 xpWorking with complementary data100 xpConcatenation50 xpConcatenating rows100 xpConcatenating rows with duplicated indexes100 xpConcatenating columns100 xpPower and flexibility50 xpAdvantages of pandas joins100 xpSimple coding for complex merges100 xp
In this chapter, you'll learn how to use pandas for joining data in a way similar to using VLOOKUP formulas in a spreadsheet. You'll learn about three types of joins and then focus on the first type, one-to-one joins.Types of joins50 xpOne-to-one joins100 xpOne-to-many joins100 xpMany-to-many joins100 xpA closer look at one-to-one joins50 xpUnscrambling the framework.100 xpReplicating VLOOKUP100 xpMerging on two or more keys100 xpCombining common data with inner joins50 xpObject-oriented merges100 xpBasic inner joins100 xpDealing with different names100 xpChoosing the correct join method100 xp
In this chapter, we'll focus on one-to-many relationships. You'll practice identifying the relationship of key columns and joining data frames by column. You'll also learn how to join two or more data frames based on their indices."Out of many, one"50 xpFramework part 2: one-to-many merges100 xpIdentifying one-to-many relationships100 xpJoining on key columns50 xpChecking for duplicate keys100 xpCompleting a one-to-many merge100 xpIndex-based joins50 xpJoining on index100 xpJoining multiple tables100 xpReviewing the one-to-many join50 xp
In the final chapter, you'll learn advanced joining techniques to use when faced with challenging data. You'll be presented with a challenge of your own in the form of a case study that tests your skills.Joining data in real life50 xpMixing indexes and columns100 xpSuffixes and indicators100 xpWorking with time data50 xpCombining time series100 xpMatching to the nearest time100 xpRecap and case study50 xpCase study challenge - part 1100 xpCase study challenge - part 2100 xpCase study challenge - part 3100 xp
PrerequisitesPython for Spreadsheet Users
John Miller is a senior data scientist who helps companies use machine learning to improve operations. His favorite work involves building predictive models that provide insights into solving difficult problems. John also works as an expert witness and actively participates in the global AI community as a speaker and writer. He holds Master's degrees in business and engineering from MIT and a Bachelor's in engineering from the US Military Academy.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA