Learn Data Science - Resources for Python & RSeptember 21st, 2016 in Learning Data Science
The Meaning of "Sexy": No Real Answers (Yet)
Even though it’s still hard to agree on a precise definition of data science or the role of a data scientist, the interest in the field keeps on rising: numerous blogs prescribe how to “really” learn data science, hot topics in forums such as Quora deal with discussions that relate to “becoming a data scientist”. Naturally, these recommendations and discussions boil down to two essential questions: what is data science exactly and how can one learn it?
Leaving the first question for what it is at the moment, DataCamp wanted to focus on the second one in this post.
Because maybe right now, you don’t have the need to hear yet another definition of what data science is and what it can mean to you.
Maybe you want to learn about it and get your first job or to switch your career.
You also don’t want just another guide that lists 50+ resources to check out.
You want a list of resources you possibly haven’t considered yet!
Learn Data Science With The Mystic Square of Resources
With the popularity of the field comes a whole variety of recommendations from all sides: beginners as well as experts, all with different backgrounds, give their view on what it means to actually learn data science.
In the end, considering all these resources and how they might fit your learning style is the key to learning data science. It’s about puzzling together the existing resources and making them fit for you.
That’s why DataCamp presents to you the mystic square of resources to learn data science: we already hand you some pieces of the puzzle that you can use to make your learning complete.
The best thing about this mystic square is that it contains resources that you might not have considered.
That means that the mystic square includes resources that are all complimentary to the ones that you have already encountered and registered to, as learning data science doesn’t limit itself to just one resource.
Even though the initial search interest for projects was already high to begin with, the demand for data science projects has been particularly high this year. Many users are looking to put their knowledge into practice or to advance their skills even further.
Secondly, also Github is finding its way into the list of resources that every beginner should know. The best Github projects that you can work on as an aspiring data scientist are:
The Data Science IPython Notebooks: this repository is one of the qualitative resources that an aspiring data scientist can encounter. Like its name already gives away, this repository is filled with IPython notebooks that cover different topics, going from Kaggle competitions to big data and deep learning.
The Pattern Classification repository is ideal for those of you who are looking for tutorials and examples to solve and understand machine learning and pattern classification tasks.
For Deep Learning In Python, this repository is the way to go!
DrivenData hosts challenges where data scientists compete to come up with the best statistical model for difficult predictive problems that make a difference. You already can’t wait to get started? Then click here.
You can also apply to become a volunteer at DataKind to boost your project experience: the timespan of your adventure you can pick, ranging from networking and quick consultation to long-term projects. Through DataKind, you have the opportunity to tackle unexplored data and huge social issues like poverty, global warming, and public health at the same time.
For projects that have already been finished, consult the high-quality reports of the Master of Information and Data Science graduates’ capstone projects. Pay attention to the way each project reports their findings and constructs the narrative to passively strengthen your storytelling skills.
If you’re looking for people that have some real life experience working on projects, try joining one of your local Meetup groups. These meetings not only bring you into contact with people from the industry, but you also get to build up your knowledge through the presentations that are given at those events or share knowledge yourself.
Note that, maybe contrary to what you might believe from the previous paragraph, Meetup groups are not only perfect for those who already have some experience, but also for those who are just starting with data science!
Some of the Meetup groups also organize boot camps, workshops, hackathons, extra social events, and much more. Meetup groups attract those who are looking to either expand their knowledge or professional network or to deepen their skills in certain data science topics; And let's not forget that these types of events are an awesome way to perfect your soft skills!
You can subscribe to receive newsletters with the newest events or install the app to stay up to date every moment of the day.
The news is maybe not the first thing that beginning data science learners are aware of, but it is certainly worth taking into account…
As a beginner, subscribing to one of the newsletters can give you certain advantages: newsletters offer you the possibility to stay up to date with the latest news, the newest case studies, and projects or job offerings.
And, if you’re also a big believer in language baths to learn a language, you will also understand that really “bathing” yourself in the data science world is necessary for you to learn quickly and to make your learning as qualitative as possible.
Besides the newsletters that you might already know and receive on a regular basis, such as the bi-monthly KDNuggets newsletter or the weekly Data Elixir newsletter, we have listed some others for you to keep an eye out for:
Data Science Weekly: this weekly newsletter brings you up to date with the latest news, articles, and jobs.
Data Science Central is a handy resource for those who are interested in big data. The site tries to give you an all-round community experience, which includes, among other things, webinars, links to job offers, blog posts, an editorial platform and the latest news, trends, and much more.
For more language-specific newsletters, you can check out:
Python Weekly is a free weekly newsletter that features the latest news, articles, new releases, jobs, and much more. But, as the name suggests, all of these things are related to Python, of course.
For your daily dose of Python tips that won’t let you down, you should subscribe to Python Tips.
For R, you might consider subscribing to the RBloggers daily update to get to know what’s going on and what articles have been published.
There are also some blogs that give you regular updates (and some extras):
Make sure to also check out the Center for Data Innovation blog, in which you can find data visualizations, weekly updates, and datasets!
FiveThirtyEight provides all types of content, ranging from light-hearted and interactive to in-depth, and is famous for offering examples of how data can be made accessible and applicable to everyday life.
The Yhat blog is a good source for those who are looking for the most interesting blogs on machine learning, data science, and engineering.
You still haven’t found what you’re looking for? Consider checking out this Github Repository, which contains a huge list of all the data science blogs.
Just like with the other types of resources to learn, there has been a huge increase in the amount of books that has been published over the past years. Besides the O’Reilly books, which do well with most readers, there are also some other books out there that you should consider:
- Hadley Wickham’s books are a no-brainer when you’re looking for good books on R. On the one hand, “R For Data Science”, in collaboration with Garrett Grolemund, and “R Packages”, have both been published by O’Reilly and are absolute recommendations. On the other hand, Hadley’s book “ggplot2: Elegant Graphics For Data analysis” is a must-read if you want to understand how to use ggplot2 to create graphics to understand your data. Also, “Advanced R”, published by Chapman and Hall/CRC, is excellent for those intermediate to advanced R users that really want to master R.
Tip: Also make sure to read Garrett Golemund’s “Hands-On Programming With R - Write Your Own Functions and Simulations”.
The OpenIntro books are perfect for those who are looking for an introduction to statistics: right now, there are three books available, namely, “OpenIntro Statistics”, “OpenIntro Statistics With Randomization and Simulation” and “Advanced High School Statistics”. These books are all free to download, but can also be bought in print and offer supporting teaching tools. Mine Çetinkaya-Rundel is a co-author of the textbook and also teaches the Data Analysis and Statistical Inference open course on DataCamp.
Daniel Kaplan’s “Introduction to Scientific Computation and Programming” will teach you the modern skills and concepts you need to use the computer expressively in scientific work, while “Statistical Modeling: A Fresh Approach” is an introduction to statistics that embraces a modeling approach and employs resampling methods.
Talks are great to listen to: they can give you a lot of inspiration, and it’s easy because you can start listening anytime you have a free moment. They are a great resource for you to learn data science because they can help you to get inspiration to do your data storytelling better, or if you’re new to data science, they can give you tips on how to approach this topic.
Our selection is listed below:
This talk, given by Mar Cabra, an investigator of The International Consortium of Investigative Journalists, on how her team used data science to unravel the Panama Papers story.
It might be an all-time classic, but for those who haven’t seen Hans Rosling’s TED talk in which he discusses his famous bubble chart on life expectancy against income for every country is a solid recommendation. Also, check out his 4-minute-video where he uses augmented reality to animate his chart. A great resource to motivate yourself to learn statistics or to get serious about storytelling.
Also consider checking out DataCamp's video series DataChats</> for interesting talks with key people from the data science industry!
The three talks listed above are just a selection from the vast amount of talks that is out there! If you’ve already listened to the three ones that I listed, and you’re desperate to find a (good) talk quickly, just head over to the TED website and search for anything related to data, statistics, machine learning, …
R For Data Science Talks
For talks on more specific topics, you can also head over to the R User Conference 2016 page and look into their searchable video archive.
Top videos that you should watch include:
Arun Srinivasan’s talk on “Efficient in-memory non-equi joins using data.table”, for those who are eager to get deeper into the
Garrett Grolemund’s talk on “Shiny Gadgets: Interactive tools for Programming and Data Analysis” to learn how you can enhance your R programming experience.
Hadley Wickham’s talk to get up to date with the plans for the future for
Mine Cetinkaya-Rundel’s talk on an R based first-year undergraduate data science course taught at Duke University for an audience of students with little to no computing or statistical background.
Tip: The useR! conferences are an essential source to stay in touch with the R community, the latest advancements and much more. Besides the 2016 conference, you can watch the videos from the useR!2014 conference here.
Python For Data Science Talks
One of the resources that you can use to find data science talks that deal with Python is PyVideo, where you can select videos from the latest events, the most active speakers, and the most active tags.
Note that this site is a general one and that it is not specifically oriented towards data science. You will need to look to find the talks that you want to hear.
Some of the talks that we have selected for you are:
Bryan Van de Ven’s talk on data visualization in the browser with Bokeh; It’s ideal for those who maybe missed the recent developments and some of the newest capabilities of Bokeh.
Jason Myers’ talk “SQLAlchemy ORM For Beginners” explores how to get started with SQLAlchemy Object-Relational Mapping (ORM).
Eric Ma’s talk for a tutorial on graph analytics.
Big Data Talks
You can not miss the videos from the Strata + Hadoop World conferences! Go here to watch the full keynotes of this year.
You can also go to the YouTube channel to watch some of the previews of this year’s and previous year’s presentations. They are all put into playlists to make it easier for you to listen to all of them :).
The top videos from the Strata + Hadoop World conferences include:
“The Future Of Data Visualization”, a talk given by Jeffrey Heer on the importance of design in data visualization.
“Data Science: Where Are We Going” by DJ Patil, on data science and the impact we can expect from it.
“Thinking Like A Bayesian” by Julia Galef outlines the most important principles of Bayesian thinking.
For those of you who are fans of talks, we have also listed interesting some podcasts that you can listen to:
The O’Reilly Data Show is a great podcast hosted by Ben Lorica which will provide you with useful technical information, great speakers, and the latest news.
For those with a keen interest in data visualization and storytelling, we also recommend listening to the Data Stories podcast, hosted by Enrico Bertini and Moritz Stefaner.
Another, maybe somewhat unusual, recommendation would be the Freakonomics podcasts, hosted by Steven Levitt and Stephen Dubner. With exciting and unexpected topics, fun style of presenting and a good dose of critical thinking, this podcast has everything to sharpen your data science skills!
Not So Standard Deviations, a podcast where Roger Peng and Hilary Parker talk about the latest news in data science and data analysis in academia and industry.
Becoming a Data Scientist, a podcast that is ideal for those who are intrigued by all related to a career that moves from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist". Renee M. P. Teate interviews data scientists or people who are on their way to becoming data scientists. With a focus on the path to becoming a data scientist and the learning, rather than the latest news in data science, this is an excellent addition to your podcast list.
Data Skeptic is a podcast that contains short mini-episodes with the host Kyle Polich explaining concepts from data science to his wife Linhda and longer interviews featuring practitioners and experts on interesting topics related to data.
Partially Derivative is a podcast about "the data of everything". More concretely, Jonathon Morgan, Vidya Spandana and Chris Albon make sure that you get interviews and stories about the data science in the world around us.
- Talking Machines, hosted by Katherine Gorman and Ryan Adams, offers you clear conversations with experts in the field, insightful discussions of industry news, and useful answers to your machine learning questions.
RStudio offers webinars on a variety of topics for those who want to learn data science with R.
You can visit this page to register for upcoming live webinars, to see the latest webinar, but also to watch other existing webinars that are conveniently grouped into learning tracks.
Topics that are covered in the webinars are RStudio, Shiny, and data science. They are a great resource not only for those who are just starting with R and data science, but also for those who have been working with R for quite some time already.
Tip: For a calendar with the upcoming webinars, check KDNuggets’ Webcasts and webinars page. Note that the data science newsletters can also inform you of upcoming webinars.
Lastly, one of the most popular topics that people look for when they start to learn data science is tutorials. It seems that many users want to be guided through a case and learn at the same time.
Below, we list some of the resources that you might not have considered using to get access to the best tutorials.
Note that some of these tutorials are language-specific.
If you’re rather looking for a Python tutorial that covers importing data,
scikit-learn basics, aggregation and grouping, feature engineering, model evaluation, and deployment, this Data Science in Python tutorial is what you’re looking for!
Also Kaggle offers general data science and R tutorials.
Tip: There are many other Python for data science tutorials on Github! Consider running a query on “data science” to find more of them!
- KDNuggets has a separate section for tutorials. This one is worth checking out as it gets frequently updated with high-quality content.
To complete your learning experience, you should consider the following:
Reddit - For those of you who haven’t already registered to Reddit and regularly check out the subreddits such as /r/rstats, /r/python, /r/datascience, /r/datasciencenews, /r/MachineLearning or one of the many others, you should definitely consider following the ones I just mentioned. For the other subreddits that might interest you, just run a query and see what else exists!
DataTau - for those of you who are familiar with HackerNews, DataTau is like a HackerNews for data scientists. It is meant to engage data scientists in conversations about the hottest content on the web on a daily basis.
Twitter - Twitter is an indispensable tool for those who want to be kept in the loop of everything that moves in the world of data science. You can already start by following:
- DJ Patil, Chief Data Scientist with the White House.
- Gregory Piatetsky, KDnuggets President, #Analytics, #BigData, #DataMining, #DataScience expert, KDD & SIGKDD co-founder, was Chief Scientist at 2 startups, part-time philosopher.
- Ben Lorica, Chief Data Scientist @OReillyMedia, Program Director of @strataconf & @OReillyAI. He is the host of the O’Reilly Data Show podcast.
- Andrew Ng, Chief Scientist of Baidu; Chairman and Co-Founder of Coursera; Stanford CS faculty.
- As a top Big Data influencer, Kirk Borne, the Principal Data Scientist at @BoozAllen, Ph.D. Astrophysicist, ♡ Data Science, is definitely worth following!
Note that there are still many other data scientists out there on Twitter! You'll have to discover the rest for yourself...
Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Likewise, you can also consider the more general forum StackOverflow if you have language-specific questions for which you can’t find the answer in Cross Validated.
- You can also join some of the active Slack groups. It’s a great way to get in touch with other professionals. We can recommend the following:
- Don’t forget to join LinkedIn or Facebook groups! LinkedIn is great if you’re looking for qualitative content, while Facebook is for those who are eager to get in touch with other data scientists, programmers, text mining addicts, and many other experts.
- LinkedIn groups that might interest you are Python Community and Python Data Science, Python Professionals and Machine Learning.
- Facebook groups that you can join are: Beginning Data Science, Analytics, Machine Learning, Data Mining, R, Python, Learn Python (www.learnpython.org), Users of R Statistical Package, Data Science With R and Data Science and Analytics.
A great way of keeping in touch with other programmers (maybe co-workers or people you have met through the community) is by starting a Whatsapp group. You can usually expect fast responses and the atmosphere is usually relaxed and helpful. There have been initiatives to do this for large groups, but up until now, only the Analytics Vidhya seems to have quite some response. You have to register first to get into this group.
There are a lot of options when it comes to courses. In this case, however, your personal learning style enters into play with the requirements that learning data science entails.
Now, what does that mean? Well, “learning” data science can sometimes give the impression that it’s a passive occupation, but in reality, you learn only by doing it. And by doing it a lot.
The courses that really let you do data science in a qualitative way are:
Coursera offers, apart from the data science courses, an entire -but paid- data science specialization track with a certificate for those who want to get certified. The program is created by Johns Hopkins University and works with industry partners Yelp and SwiftKey. The projects in the specialization track have reading material, videos and quizzes included in them.
The most known EdX courses on data science are created by Microsoft: the courses are for free, but you will have to pay if you want to get the certificate. The learning material includes mostly videos and interactive exercises.
DataCamp courses provide videos with the best instructors from academia and industry, combined with exercises with personalized feedback to kickstart your data science learning. You can start some of our courses for free, but others are premium content and will require you to pay. The difference with the two resources mentioned above is that you will also find a vibrating community section with open courses, tutorials, and blogs to support your learning.
The Key To Really Learn Data Science
In the end, the number of resources will still be overwhelming, but the mystic square can definitely offer you a great place to start. The key to learning data science is then to keep on sliding the puzzle pieces of the mystic square until you find the combination that’s right for you and that fits you.
In discovering the mystic square, you will see that resources sometimes overlap and may give you more than you would expect. Other times, you discover new resources that complete your learning and make it a little broader than before. This all will make you understand how vast the data science field is and how you can make your learning as wide as possible.
This way, you’ll stay motivated, and you’ll bring fun to the lifelong journey of data science learning. Because the key to data science is to Keep Educating Yourself.
What does your learning data science mystic square look like and what is your key to learning data science?
No comments yet. Be the first to respond!