Tutorials
r programming
+3

How to Execute Python/R in SQL

After reading this tutorial, you'll know how to embed R & Python scripts in T-SQL statements & know what data types are used to pass data between SQL & Python/R.

Introduction

Did you know that you can write R and Python code within your T-SQL statements? Machine Learning Services in SQL Server eliminates the need for data movement. Instead of transferring large and sensitive data over the network or losing accuracy with sample csv files, you can have your R/Python code execute within your database. Easily deploy your R/Python code with SQL stored procedures making them accessible in your ETL processes or to any application. Train and store machine learning models in your database bring intelligence to where your data lives.

You can install and run any of the latest open source R/Python packages to build Deep Learning and AI applications on large amounts of data in SQL Server. There are also leading edge, high-performance algorithms in Microsoft's RevoScaleR and RevoScalePy APIs. Using these with the latest innovations in the open source world allows you to bring unparalleled selection, performance, and scale to your applications.

If you are excited to try out SQL Server Machine Learning Services, check out the hands-on tutorial below. If you do not have Machine Learning Services installed in SQL Server, you will first want to follow the getting started tutorial published here: https://blogs.msdn.microsoft.com/mlserver/2018/05/18/getting-started-with-machine-learning-services-in-sql-server/

How-To Tutorial

What you will learn

This tutorial will cover the basics of how to Execute R and Python in T-SQL statements. After completing the tutorial, you will know:

  1. How to embed R and Python scripts in T-SQL statements.
  2. How to use basic parameters to specify input and output data to R/Python scripts.
  3. What data types are used to pass data between SQL and Python/R processes.

Setup Prerequisites

If you do not have Machine Learning Services installed on SQL Server, you will first want to read this getting started tutorial or watch these videos to get set up: https://aka.ms/InstallMLServices https://aka.ms/EnableMLServices

Video companion

If you like learning through videos, this tutorial is also available on YouTube: @https://aka.ms/ExecuteMLServices https://youtu.be/ACejZ9optCQ

Basics

Open up SQL Server Management Studio and make a connection to your server. Open a new query and paste this basic example: (While Python is used in these samples, you can do everything with R as well)

EXEC sp_execute_external_script @language = N'Python',
@script = N'print(3+4)'

Sp_execute_external_script is a special system stored procedure that enables R and Python execution in SQL Server. There is a language parameter that allows you to choose between Python and R. There is a script parameter where you can paste R or Python code. If you do not see an output print 7, go back and review the setup steps in this article.

Parameter Introduction

After discussing a basic example, let's start adding more pieces:

EXEC sp_execute_external_script @language =N'Python',
@script = N'
OutputDataSet = InputDataSet;
',
@input_data_1 =N'SELECT 1 AS Col1';

Machine Learning Services provides more natural communications between SQL and R/Python with an input data parameter that accepts any SQL query. The input parameter name is called input_data_1.

You can see in the python code that there are default variables defined to pass data between Python and SQL. The default variable names are OutputDataSet and InputDataSet You can change these default names like this example:

EXEC sp_execute_external_script @language =N'Python',
@script = N'
MyOutput = MyInput;
',
@input_data_1_name = N'MyInput',
@input_data_1 =N'SELECT 1 AS foo',
@output_data_1_name =N'MyOutput';

As you executed these examples, you might have noticed that they each return a result with (No column name)? You can specify a name for the columns that are returned by adding the WITH RESULT SETS clause to the end of the statement which is a comma-separated list of columns and their datatypes.

EXEC sp_execute_external_script  @language =N'Python',
@script=N'
MyOutput = MyInput;
',
@input_data_1_name = N'MyInput',
@input_data_1 =N'SELECT 1 AS foo,2 AS bar',
@output_data_1_name =N'MyOutput'
WITH RESULT SETS ((MyColName int, MyColName2 int));

Input/Output Data Types

Alright, let's discuss a little more about the input/output data types used between SQL and Python. Your input SQL SELECT statement passes a "DataFrame" to python relying on the Python Pandas package. Your output from Python back to SQL also needs to be in a Pandas DataFrame object. If you need to convert scalar values into a DataFrame here is an example:

EXEC sp_execute_external_script @language =N'Python',
@script=N'
import pandas as pd
c = 1/2
d = 1*2
s = pd.Series([c,d])
df = pd.DataFrame(s)
OutputDataSet = df
'

Variables c and d are both scalar values, which you can add to a pandas Series if you like, and then convert them to a pandas DataFrame. This one shows a little bit more complicated example, go read up on the python pandas package documentation for more details and examples:

EXEC sp_execute_external_script @language =N'Python',
@script=N'
import pandas as pd
s = {"col1": [1, 2], "col2": [3, 4]}
df = pd.DataFrame(s)
OutputDataSet = df
'

You now know the basics of how to execute R and Python in T-SQL! It is very simple to call EXEC sp_execute_external_script, copy/paste your R or Python code, and copy/paste your SQL input select. Maybe you need to do a data transformation or extraction that is simple in R/Python but is too complex to write in SQL. Maybe you are training a deep neural network on huge amounts of data too large to send over a network. Maybe you want to automatically score/predict an outcome on new data entering into your OLTP system. The use cases are broad, and you are now ready to increase security, performance, and scale with Machine Learning Services in SQL Server.

Further Reading

Did you know you can also write your R and Python code in your favorite IDE like RStudio and Jupyter Notebooks and then remotely send the execution of that code to SQL Server? Check out these documentation links to learn more: Documentation: https://aka.ms/R-RemoteSQLExecution Documentation: https://aka.ms/R-RemoteSQLExecution Github Jupyter Notebook example: https://aka.ms/RemoteExecJupyter

If you would like to learn more about SQL, check out our Intro to SQL for Data Science course.

Want to leave a comment?