Learn about some of the advantages of using Amazon Web Services Elastic Compute Cloud (EC2). Then, the first part of the tutorial covers how to launch and connect to Windows virtual machines or instances on EC2. The next part goes over how to setup a basic data science environment (install R, RStudio, and Python) on the instance.
Amazon Web Services Elastic Compute Cloud (EC2): A Brief Case
There are times when one is limited by the capabilities of a desktop or laptop. Suppose a data scientist has a large dataset that they would like to do some analysis on. The scientist proceeds to try and load the entire dataset into memory and an error like the one below occurs.
The error resulted because the available RAM was exhausted. The operating system couldn't allocate another 500Mb of RAM. While there are many different solutions to this type of problem, one possible solution could be to upgrade the RAM of the computer. Besides having to make an investment in more RAM, there are limits to how far some computers can be upgraded. The potential solution explored in this tutorial is to use a virtual machine in the cloud (AWS) with more RAM and CPU.
Virtual machines on AWS EC2, also called instances, have many advantages. A few of the advantages include being highly scalable (one can choose instances with more RAM, CPU etc), they are easy to start and stop (outside the free tier, customers pay for what they use), and they allow for the selection of different platforms (operating systems). An important point thing to emphasize is that although this tutorial covers how to launch a Windows based virtual machine, there are many different types of virtual machines for many different purposes.
With that, let's get started.
Create an AWS Account and Sign into AWS.
1.On the Amazon Web Services site (here's the link), click on "Sign In to the Console". Sign in if you have account. If you don't, you will need to make one.
2.On the EC2 Dashboard, click on EC2.
Create an Instance
3.On the Amazon EC2 console, click on Launch Instance.
4.Click on the "Select" button in the row with Microsoft Windows Server 2016 Base. Please note that this will create a Windows based instance instead of a typical Linux based instance. This effects how you will connect to the instance.
5.Make sure t2 micro (free instance type) is selected.
and click on "Review and Launch"
6.Click on Launch.
7.Select "Create a new key pair". In the box below ("Key pair name"), fill in a key pair name. I named my key DataCampTutorial, but you can name it whatever you like. Click on "Download Key Pair". This will download the key. Keep it somewhere safe.
Next, click on "Launch Instances"
8.The instance is now launched. Go back to the Amazon EC2 console. I would recommend that you click on what is enclosed in the red rectangle as it will bring you back to the console.
9.Wait till you see that "Instance State" is running before you proceed to the next step. This can take a few minutes.
Connect to your Instance
10.Click on connect.
11.Click on "Download Remote Desktop File". Save the remote desktop file (rdp) file somewhere safe.
12.Click on "Get Password". Keep in mind that you have to wait at least 4 minutes after you launch an instance before trying to retrieve your password.
13.Choose the pem file you downloaded from step 7 and then click "Decrypt Password"
14.After you decrypt your password, save it somewhere safe. You will need it to log into your instance.
15.Open your rdp file. Click on continue. If your local computer is a Mac, you will need to download "Microsoft Remote Desktop" from the App Store to be able to open your rdp file.
16.Enter your password you got from step 14
After you enter your password, you should see a screen like this
To be able to install R and/or Python, it really helps to have a browser. The instance comes preinstalled with Internet Explorer with Enhanced Security Configuration enabled which can be difficult to work with. Download firefox as an alternative browser to avoid the enhanced security from Internet Explorer.
- Type the following into Internet Explorer https://www.mozilla.org/firefox/new/?scene=2
- Click on "Add" when you see the popup below.
Click on "Add" again.
3.When you get to the Firefox page, you may have to click on add a couple times (similar to steps 1 and 2) until the Firefox download starts. If the download doesn't start automatically, then click on "click here".
Now that Firefox is installed, be sure to use Firefox as your browser. It will make it a lot simpler than continuously dealing with security issues from Internet Explorer.
Install R and Python
Now that firefox is installed, you can install R and Python as you would on a normal windows machine. If you need help installing, here are some links to guides below.
Stop or Terminate an Instance (Important)
After finishing use of an instance, it is a good idea to stop or terminate the instance. To do this, go to the Amazon EC2 console and click on "Actions" then "Instance State" and you will have the option of either stopping or terminating the instance.
If you plan on using the instance again, stop the instance. If you don't plan on using the instance again, terminate the instance.
While the instance in this tutorial was in the "free tier", I would recommend terminating the instance so you don't forget about it.
This tutorial provided a quick guide to launching and connecting to EC2 instances as well as how you would go about setting up a basic data science environment. If you would like to continue your EC2 learning, I suggest you check out the tutorial, "Deep Learning with Jupyter Notebooks in the Cloud" which covers how to setup a linux based EC2 GPU instance for deep learning applications. If you any questions or thoughts on the tutorial, feel free to reach out in the comments below or through Twitter.
20 Top SQL Joins Interview Questions
Data Sets and Where to Find Them: Navigating the Landscape of Information
You’re invited! Join us for Radar: The Analytics Edition
10 Top Data Analytics Conferences for 2024
A Data Science Roadmap for 2024