Skip to main content
HomeTutorialsGit

Setup a Data Science Environment on your Computer

Learn about the various options you have to setup a data science environment with Python, R, Git, and Unix Shell on your local computer.
Jun 2018  · 8 min read

After learning on an online interactive training and education platform like Datacamp, one of the next steps is taking that skill gained in Python, R, Git, or Unix Shell and using it on your local computer. It is not always easy to know what you need to install for the various projects you have in mind. This tutorial will let you know what packages, what softwares you need to install to get started with the various technologies. With that, let's get started!

Python

To be able to use Python on your local computer, you first need to install it. There are many different python distributions, but for data science, the Anaconda Python Distribution is the most popular.

Benefits of Anaconda

Anaconda is a package manager, an environment manager, and Python distribution that contains a collection of many open source packages. An installation of Anaconda comes with many packages such as numpy, scikit-learn, scipy, and pandas preinstalled and it is also the recommended way to install Jupyter Notebooks. The image below shows a Jupyter Notebook in action. Jupyter notebooks contain both code and rich text elements, such as figures, links, and equations. You can learn more about Jupyter Notebooks here.

pandas dataframe tutorial gif

Some other benefits of Anaconda include:

  • If you need additional packages after installing Anaconda, you can use Anaconda's package manager conda or pip to install those packages.This is highly advantageous as you don't have to manage dependencies between multiple packages yourself. Conda even makes it easy to switch between Python 2 and 3 (you can learn more here).

  • Anaconda comes with Spyder, a Python Integrated Development Environment. An Integrated Development Environment is a coding tool which allows you to write, test and debug your code as they typically offer code completion, code insight by highlighting, resource management and debugging tools among many other features. It is also possible to integrate Anaconda with other Python Integrated Development Environments including PyCharm and Atom. You can learn more about different Python Integrated Development Environments here.

How to Install Anaconda (Python)

Here are some links to guides below on how to install Anaconda on your operating system.

Install Anaconda on Mac

Install Anaconda on Windows

R Programming Language

Most people generally install RStudio alongside the R programming language. The RStudio integrated development environment (IDE) is generally considered the easiest and best way to work with the R Programming language.

Benefits of RStudio

An install of the R programming language gives you a set of functions and objects from the R language and an R interpreter that allows you to build and run commands. RStudio gives you an integrated development environment that works alongside the R interpreter.

rstudio screen

When you open RStudio, an screen like the one above appears. A few features in contained in the four RStudio Panes are: (A) a Text Editor. (B) Dashboard to Work Environment. (C) R Interpreter. (D) Help Window and Package Management System. All these features make it so RStudio is all you really need after installing R.

How to Install R and RStudio

Here are some links to guides below on how to install R and RStudio on your operating system.

Install R and RStudio on Mac

Install R and RStudio on Windows

Unix Shell

Navigating directories, copying files, using virtual machines, and more are a regular part of a data scientist's job. You will often find the Unix Shell utilized to accomplish these tasks.

Some Uses of a Unix Shell

1 - Many Cloud Computing Platforms are Linux based (utilize a flavor of Unix Shell). For instance, if you want to Setup a Data Science Environment on Google Cloud, or do Deep Learning With Jupyter Notebooks In The Cloud (AWS EC2) it requires some Unix Shell knowledge. There are times when you may have a use for a Windows virtual machine, but it is less common.

2 - Unix Shell provides a number of useful commands such as: wc command which counts the number of lines or words in a file, cat command which concatenates/merges files, head and tail commands which help you subset large files. You can learn more about this in 8 Useful Shell Commands for Data Science. Also, check out DataCamp's course Data Processing in Shell.

3 - You will often find Unix Shell integrated with other technologies as you will see throughout the rest of the article.

Integration with Other Technologies

You will often find Unix Shell commands integrated in other technologies. For example, it is common to find shell commands in Jupyter Notebooks alongside Python code. In Jupyter Notebook, you can access shell commands by escaping to the shell by using an !. In the code below, the result of the shell command ls (which lists all the files in the current directory) is assigned to the Python variable myfiles.

myfiles = !ls

The image below shows some Python Code integrated in a workflow to combine multiple datasets. Notice a Unix Shell command (enclosed in the red rectangle) integrated in a Jupyter Notebook. unix shell command in jupyter notebook

Keep in mind that the code in the image above isn't some unique way to do a task, but just a small example of how you may see Unix utilized. If you want to learn how to use Unix for Data Science, Datacamp has a free course Introduction to Shell for Data Science which I highly recommend. It is a skill that lots of aspiring data scientists forget about, but it is a very important skill in the workplace.

Unix Shell on Mac

Mac comes with a Unix shell so you usually don't need to install anything! An important point is that there is a variety of Unix systems that have different commands. Sometimes you find that you don't have a Unix command (like wget) found on another Unix system. Similar to how you have package managers through RStudio and Anaconda, Mac can have a package manager called Homebrew if you install it. The link below goes over how to install and use Homebrew.

How to Install and Use Homebrew

Unix Shell Commands on Windows

Windows does not come with a Unix Shell. Keep in mind that what Unix Shell does for you is give you useful commands for Data Science. There are many different ways to get these useful commands on Windows. You can Install Git on Windows with the optional Unix tools so that you can have Unix commands on your Command Prompt. Alternatively, you could install Gnu on Windows (GOW) (10mb), Cygwin (100mb minimum), among many other options.

Git

Git is the most widely used version control system. A version control system is something that records changes to a file or set of files over time so that you can recall specific versions later. Git is an important technology as it really helps you work with others and it is something you will find in a lot of workplaces. Some of the benefits of learning Git include:

  • Nothing version controlled using Git is ever lost, so you can always go back to see previous versions of your programs.

  • Git notifies you when your work conflicts with someone else's, so it's harder (but not impossible) to accidentally overwrite work.

  • Git can synchronize work done by different people on different machines, so it scales as your team does.

  • Knowing Git makes it easier to contribute to open source development of packages in R and Python.

Integration with Other Technologies

One of the cool things about Git is you often find it integrated with other technologies. Earlier I mentioned that the RStudio integrated development environment (IDE) is generally considered the best way to work with the R Programming language. RStudio offers version control support and most many Python Integrated Development Environments (IDE) (learn more here) offer version control support.

If you want to learn how to use Git for Data Science, DataCamp has a free course Introduction to Git for Data Science which I highly recommend.

How to Install Git

Here are some links to guides below on how to install Git on your operating system.

Install Git on Mac

Install Git on Windows

Conclusion

This tutorial provides a way to setup a local data science environment on your local computer. An important point to emphasize is that these technologies can and are often integrated together. If you any questions or thoughts on the tutorial, feel free to reach out in the comments below or through Twitter. Also, feel free to check out my other installation based tutorials located on my Github or my Medium blog.

Topics

Learn more about Data Science

Certification available

Course

Introduction to Git

4 hr
25.1K
Familiarize yourself with Git for version control. Explore how to track, compare, modify, and revert files, as well as collaborate with colleagues using Git.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

Python NaN: 4 Ways to Check for Missing Values in Python

Explore 4 ways to detect NaN values in Python, using NumPy and Pandas. Learn key differences between NaN and None to clean and analyze data efficiently.
Adel Nehme's photo

Adel Nehme

5 min

Seaborn Heatmaps: A Guide to Data Visualization

Learn how to create eye-catching Seaborn heatmaps
Joleen Bothma's photo

Joleen Bothma

9 min

Test-Driven Development in Python: A Beginner's Guide

Dive into test-driven development (TDD) with our comprehensive Python tutorial. Learn how to write robust tests before coding with practical examples.
Amina Edmunds's photo

Amina Edmunds

7 min

Exponents in Python: A Comprehensive Guide for Beginners

Master exponents in Python using various methods, from built-in functions to powerful libraries like NumPy, and leverage them in real-world scenarios to gain a deeper understanding.
Satyam Tripathi's photo

Satyam Tripathi

9 min

Python Linked Lists: Tutorial With Examples

Learn everything you need to know about linked lists: when to use them, their types, and implementation in Python.
Natassha Selvaraj's photo

Natassha Selvaraj

9 min

See MoreSee More