Dhavide Aruliah is Director of Training at Continuum Analytics, the creator and driving force behind Anaconda—the leading Open Data Science platform powered by Python. Dhavide was previously an Associate Professor at the University of Ontario Institute of Technology (UOIT). He served as Program Director for various undergraduate & postgraduate programs at UOIT. His research interests include computational inverse problems, numerical linear algebra, & high-performance computing. The materials for this course were produced by the Continuum training team.
Many real-world data sets contain strings, integers, time-stamps and unstructured data. How do you store data like this so that you can manipulate it and easily retrieve important information? The answer is in a pandas DataFrame! In this course, you'll learn how to use the industry-standard pandas library to import, build, and manipulate DataFrames. With pandas, you'll always be able to convert your data into a form that permits you to analyze it easily. You'll also learn more about NumPy, how it stores data, and its relation to the pandas DataFrame.
In this chapter, you will become acquainted with the powertool of pandas - the DataFrame. You will learn how to use pandas to import and then inspect a variety of datasets, ranging from population data obtained from The World Bank to monthly stock data obtained via Yahoo! Finance. You will practice building DataFrames from scratch and become familiar with pandas' data visualization capabilities.
Having learned how to ingest and inspect your data, the next step is to explore it visually as well as quantitatively. This process, known as exploratory data analysis (EDA), is a crucial component of any data science project, and pandas has powerful methods that help with statistical and visual EDA. In this chapter, you will learn how and when to apply these techniques.
In this chapter, you will learn how to manipulate and visualize time series data using pandas. You will become familiar with concepts such as upsampling, downsampling, and interpolation. You will practice using pandas' method chaining to efficiently filter your data and perform time series analyses. From stock prices to flight timings, time series data are found in a wide variety of domains and being able to effectively work with such data can be an invaluable skill.
Working with real-world weather and climate data, in this chapter you will bring together and apply all of the skills you have acquired in this course. You will use pandas to manipulate the data into a form usable for analysis, and then systematically explore it using the techniques you learned in the prior chapters. Enjoy!