Start Learning for Free

Join over 1,000,000 other Data Science learners and start one of our interactive tutorials today!

Topic python small

Python Exploratory Data Analysis Tutorial

March 15th, 2017 in Python

As you will know by now, the Python data manipulation library Pandas is used for data manipulation; For those who are just starting out, this might imply that this package can only be handy when preprocessing data, but much less is true: Pandas is also great to explore your data and to store it after you’re done preprocessing the data.

Additionally, for those who have been following DataCamp’s Python tutorials or that have already been introduced to the basics of SciPy, NumPy, Matplotlib and Pandas, it might be a good idea to recap some of the knowledge that you have built up.

Today’s tutorial will actually introduce you to some ways to explore your data efficiently with all the above packages so that you can start modeling your data:



I have a quick question about clean and pre-process real datasets. I got real SCADA data from Wind Turbines and I want to apply PCA, which requires that the features must be scaled.

The dataset have features that indicates min,max,mean, standard deviations, of each 10 minute block. My question is about the standard deviation features.. Those have to be scaled too?. Or I have to leave the original value untouched?

04/05/17 10:03 AM |
Hi there! Thanks for your comment :) I would try out scaling the data and then splitting in train and test sets and also without. Then, you'll be able to see what performs best when you've modeled your data and assess the performance. Good luck!
05/26/17 8:12 PM |