Skip to main content
HomeTutorialsData Science

AWS EC2 Tutorial For Beginners

Discover why you should use Amazon Web Services Elastic Compute Cloud (EC2) and how you can set up a basic data science environment on a Windows instance.
Dec 2017  · 7 min read

Learn about some of the advantages of using Amazon Web Services Elastic Compute Cloud (EC2). Then, the first part of the tutorial covers how to launch and connect to Windows virtual machines or instances on EC2. The next part goes over how to setup a basic data science environment (install R, RStudio, and Python) on the instance.

Amazon Web Services Elastic Compute Cloud (EC2): A Brief Case

There are times when one is limited by the capabilities of a desktop or laptop. Suppose a data scientist has a large dataset that they would like to do some analysis on. The scientist proceeds to try and load the entire dataset into memory and an error like the one below occurs.

error

The error resulted because the available RAM was exhausted. The operating system couldn't allocate another 500Mb of RAM. While there are many different solutions to this type of problem, one possible solution could be to upgrade the RAM of the computer. Besides having to make an investment in more RAM, there are limits to how far some computers can be upgraded. The potential solution explored in this tutorial is to use a virtual machine in the cloud (AWS) with more RAM and CPU.

Virtual machines on AWS EC2, also called instances, have many advantages. A few of the advantages include being highly scalable (one can choose instances with more RAM, CPU etc), they are easy to start and stop (outside the free tier, customers pay for what they use), and they allow for the selection of different platforms (operating systems). An important point thing to emphasize is that although this tutorial covers how to launch a Windows based virtual machine, there are many different types of virtual machines for many different purposes.

With that, let's get started.

Create an AWS Account and Sign into AWS.

1.On the Amazon Web Services site (here's the link), click on "Sign In to the Console". Sign in if you have account. If you don't, you will need to make one.

create aws account

2.On the EC2 Dashboard, click on EC2.

ec2 dashboard

Create an Instance

3.On the Amazon EC2 console, click on Launch Instance.

launch instance

4.Click on the "Select" button in the row with Microsoft Windows Server 2016 Base. Please note that this will create a Windows based instance instead of a typical Linux based instance. This effects how you will connect to the instance.

choose ami

5.Make sure t2 micro (free instance type) is selected.

choose instance type

and click on "Review and Launch"

choose instance type 2

6.Click on Launch.

review instance launch

7.Select "Create a new key pair". In the box below ("Key pair name"), fill in a key pair name. I named my key DataCampTutorial, but you can name it whatever you like. Click on "Download Key Pair". This will download the key. Keep it somewhere safe.

download key pair

Next, click on "Launch Instances"

launch instances

8.The instance is now launched. Go back to the Amazon EC2 console. I would recommend that you click on what is enclosed in the red rectangle as it will bring you back to the console.

amazon ec2 console

9.Wait till you see that "Instance State" is running before you proceed to the next step. This can take a few minutes.

instance state

Connect to your Instance

10.Click on connect.

connect to instance

11.Click on "Download Remote Desktop File". Save the remote desktop file (rdp) file somewhere safe.

download remote desktop file

12.Click on "Get Password". Keep in mind that you have to wait at least 4 minutes after you launch an instance before trying to retrieve your password.

get password

13.Choose the pem file you downloaded from step 7 and then click "Decrypt Password"

decrypt password

14.After you decrypt your password, save it somewhere safe. You will need it to log into your instance.

instance password

15.Open your rdp file. Click on continue. If your local computer is a Mac, you will need to download "Microsoft Remote Desktop" from the App Store to be able to open your rdp file.

download microsoft remote desktop

16.Enter your password you got from step 14

enter password

After you enter your password, you should see a screen like this

screen

Download Firefox

To be able to install R and/or Python, it really helps to have a browser. The instance comes preinstalled with Internet Explorer with Enhanced Security Configuration enabled which can be difficult to work with. Download firefox as an alternative browser to avoid the enhanced security from Internet Explorer.

  1. Type the following into Internet Explorer https://www.mozilla.org/firefox/new/?scene=2

download firefox

  1. Click on "Add" when you see the popup below.

add

Click on "Add" again.

add

3.When you get to the Firefox page, you may have to click on add a couple times (similar to steps 1 and 2) until the Firefox download starts. If the download doesn't start automatically, then click on "click here".

download

Now that Firefox is installed, be sure to use Firefox as your browser. It will make it a lot simpler than continuously dealing with security issues from Internet Explorer.

Install R and Python

Now that firefox is installed, you can install R and Python as you would on a normal windows machine. If you need help installing, here are some links to guides below.

Stop or Terminate an Instance (Important)

After finishing use of an instance, it is a good idea to stop or terminate the instance. To do this, go to the Amazon EC2 console and click on "Actions" then "Instance State" and you will have the option of either stopping or terminating the instance.

If you plan on using the instance again, stop the instance. If you don't plan on using the instance again, terminate the instance.

While the instance in this tutorial was in the "free tier", I would recommend terminating the instance so you don't forget about it.

Stop or Terminate an Instance

Conclusion

This tutorial provided a quick guide to launching and connecting to EC2 instances as well as how you would go about setting up a basic data science environment. If you would like to continue your EC2 learning, I suggest you check out the tutorial, "Deep Learning with Jupyter Notebooks in the Cloud" which covers how to setup a linux based EC2 GPU instance for deep learning applications. If you any questions or thoughts on the tutorial, feel free to reach out in the comments below or through Twitter.

Topics

Related courses

Course

Introduction to AWS Boto in Python

4 hr
14.3K
Learn about AWS Boto and harnessing cloud technology to optimize your data workflow.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Data Sets and Where to Find Them: Navigating the Landscape of Information

Are you struggling to find interesting data sets to analyze? Do you have a plan for what to do with a sample data set once you’ve found it? If you have data set questions, this tutorial is for you! We’ll go over the basics of what a data set is, where to find one, how to clean and explore it, and where to showcase your data story.
Amberle McKee's photo

Amberle McKee

11 min

You’re invited! Join us for Radar: The Analytics Edition

Join us for a full day of events sharing best practices from thought leaders in the analytics space
DataCamp Team's photo

DataCamp Team

4 min

10 Top Data Analytics Conferences for 2024

Discover the most popular analytics conferences and events scheduled for 2024.
Javier Canales Luna's photo

Javier Canales Luna

7 min

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

A Complete Guide to Alteryx Certifications

Advance your career with our Alteryx certification guide. Learn key strategies, tips, and resources to excel in data science.
Matt Crabtree's photo

Matt Crabtree

9 min

Mastering Bayesian Optimization in Data Science

Unlock the power of Bayesian Optimization for hyperparameter tuning in Machine Learning. Master theoretical foundations and practical applications with Python to enhance model accuracy.
Zoumana Keita 's photo

Zoumana Keita

11 min

See MoreSee More