Great data visualization is the cornerstone of impactful data science. Visualization helps you to both find insight in your data and share those insights with your audience. Everyone learns how to make a basic scatter plot or bar chart on their journey to becoming a data scientist, but the true potential of data visualization is realized when you take a step back and think about what, why, and how you are visualizing your data. In this course you will learn how to construct compelling and attractive visualizations that help you communicate the results of your analyses efficiently and effectively. We will cover comparing data, the ins and outs of color, showing uncertainty, and how to build the right visualization for your given audience through the investigation of a datasets on air pollution around the US and farmer's markets. We will finish the course by examining open-access farmers market data to build a polished and impactful visual report.
Highlighting your dataFree
How do you show all of your data while making sure that viewers don't miss an important point or points? Here we discuss how to guide your viewer through the data with color-based highlights and text. We also introduce a dataset on common pollutant values across the United States.
Using color in your visualizations
Color is a powerful tool for encoded values in data visualization. However, with this power comes danger. In this chapter, we talk about how to choose an appropriate color palette for your visualization based upon the type of data it is showing.Color in visualizations50 xpGetting rid of unnecessary color100 xpFixing Seaborn's bar charts100 xpContinuous color palettes50 xpMaking a custom continuous palette100 xpCustomizing a diverging palette heatmap100 xpAdjusting your palette according to context100 xpCategorical palettes50 xpUsing a custom categorical palette100 xpDealing with too many categories100 xpColoring ordinal categories100 xpChoosing the right variable to encode with color100 xp
Uncertainty occurs everywhere in data science, but it's frequently left out of visualizations where it should be included. Here, we review what a confidence interval is and how to visualize them for both single estimates and continuous functions. Additionally, we discuss the bootstrap resampling technique for assessing uncertainty and how to visualize it properly.Point estimate intervals50 xpBasic confidence intervals100 xpAnnotating confidence intervals100 xpConfidence bands50 xpMaking a confidence band100 xpSeparating a lot of bands100 xpCleaning up bands for overlaps100 xpBeyond 95%50 xp90, 95, and 99% intervals100 xp90 and 95% bands100 xpUsing band thickness instead of coloring100 xpVisualizing the bootstrap50 xpThe bootstrap histogram100 xpBootstrapped regressions100 xpLots of bootstraps with beeswarms100 xp
Visualization in the data science workflow
Often visualization is taught in isolation, with best practices only discussed in a general way. In reality, you will need to bend the rules for different scenarios. From messy exploratory visualizations to polishing the font sizes of your final product; in this chapter, we dive into how to optimize your visualizations at each step of a data science workflow.First explorations50 xpLooking at the farmers market data100 xpScatter matrix of numeric columns100 xpDigging in with basic transforms100 xpExploring the patterns50 xpIs latitude related to months open?100 xpWhat state is the most market-friendly?100 xpPopularity of goods sold by state100 xpMaking your visualizations efficient50 xpStacking to find trends100 xpUsing a plot as a legend100 xpTweaking your plots50 xpCleaning up the background100 xpRemixing a plot100 xpEnhancing legibility100 xpCongrats!50 xp
In the following tracksData Visualization with Python
Nicholas StrayerSee More
Biostatistician at Vanderbilt
I am currently biostatistician and data scientist at Vanderbilt University. My research focuses on the fusion of machine learning and data visualization to explore and explain electronic health records data. I have worked as a data journalist in the graphics department at the New York Times, a data scientist at Johns Hopkins University Data Science Lab, and as a 'Data Artist in Residence' at data visualization startup Conduce. I am active on Twitter @nicholasstrayer and blog about data science and visualization with fellow DataCamp instructor Lucy D'Agostino McGowan at Live Free or Dichotomoize.