Interactive Data Analysis with SQL Server using Jupyter Notebooks


In this post “Interactive Data Analysis with SQL Server using Jupyter Notebooks“, we will demonstrate how we can use Jupyter Notebooks for interactive data analysis with SQL Server. Jupyter notebooks are one of the most useful tools for any Data Scientist/Data Analyst. It supports 40+ programming languages and facilitates web-based interactive programming IDE. We can put comments, headings, codes, and output in one single document. This document maintains the context to the original data source which means we can re-execute the code whenever we need it. This feature facilitates Data scientists/Data analysts to play with the code during the presentations. Also, these notebooks are very handy in sharing and can be shared easily across the teams.

What is Jupyter Lab

Jupyter Lab is the next-generation web-based tool for Jupyter notebooks. It enables tab based programming model which is highly extensible. We can arrange multiple windows side by side. The Jupyter Lab brings all the features of the Jupyter notebook along with some additional features. In this tutorial, we will be using the SQL magic commands in Jupyter Lab to interact with SQL Server directly from Jupyter notebooks. This can help you to execute SQL queries/procedures directly from the notebook.

Prerequisites for the setup

We need to install below packages to enable interactive data analysis with a relational database such as SQL Server using Jupyter lab:

Python – We need to download and install Python (consider the latest version of Python 3) from Python’s official website. Once installed, we can connect Python to SQL Server using pyodbc also. However, in this post, we will be using the ipython’s magic sql commands from Jupyter notebooks.

jupyterlab module – This module is required to setup Jupyter notebooks on our machine. This will automatically install the notebooks and other required dependencies. We can execute the below pip command to install jupyterlab from the command prompt.

pip install jupyterlab

pyodbc module – This module is required to create a connection from Python to SQL Server. To install this package, we can use below pip command at the command prompt:

pip install pyodbc

ipython-sql module – This module is required to enable SQL magic commands from Jupyter notebooks. We can use %sql followed by the actual query. We can use this pip command at the command prompt to install it:

pip install ipython-sql

Note: In case you want to set up the startup folder for Jupyter notebook on Windows or Mac machine, you can follow this link.

Querying SQL Server from Jupyter Notebook

To start the Jupyter lab in the web browser, we can simply open a command prompt window and type “Jupyter lab” and hit the Enter key. This will open the Jupyter lab web interface in the default browser.

Let’s use the Jupyter lab to query SQL Server tables directly from Jupyter notebook. To do that we need to follow these steps:

  1. Load the ipython’s sql module
  2. Create a connection
  3. Write queries prefixed with magic command %sql and execute it

Load the ipython’s sql module

To load the ipython’s sql module which enables the magic command for SQL Server, we need to use below code:

%load_ext sql

Create a connection

Now, we need to set the connection string so that the notebook magic commands can execute the SQL queries as like any other SQL Server IDEs. Below is the syntax for the connection string:

Connection string syntax for SQL Server authentication:

mssql+pyodbc://user:password@server:port/DatabaseName?driver=DriverName

Considering that we are using SQL Server 2017 with ODBC driver 2017, we can use a connection string as below:

%sql mssql+pyodbc://admin:test_12345@localhost:1433/TestDB?driver=ODBC+Driver+17+for+SQL+Server

Connection string syntax for Windows authentication:

mssql+pyodbc://user@server:port/DatabaseName?trusted_connection=yes&driver=DriverName

We need to set trusted_connection = yes if we are using windows authentication rather than the SQL Server authentication. Assuming that we are using SQL Server 2017 with ODBC driver 2017, we can use a connection string as below:

%sql mssql+pyodbc://admin@localhost:1433/TestDB?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server

When we execute the above command, it should return a message like “‘Connected: admin@TestDB‘”. Once, we get that message, we can execute the SQL queries from the Jupyter notebook directly.

Connect SQL Server from Jupyter notebook - Connection string

Connect SQL Server from Jupyter notebook – Connection string

Write queries prefixed with magic command %sql and execute it:

Now that we have loaded the SQL module and set the connection string, next, we can write our SQL queries and execute it like below:

%sql SELECT * FROM SYS.OBJECTS

The above command will return the list of all the objects from sys.objects system view.

Writing multiline SQL queries:

To write multiline SQL queries as we do with SQL Server Management Studio IDE, we can simply prefix our query with %%sql. Below is an example:

%%sql DROP TABLE IF EXISTS [dbo].[tbl_Sales];

CREATE TABLE [dbo].[tbl_Sales]
(
	[Seq] [bigint] NULL,
	[EmpName] [nvarchar](256) NULL,
	[Product] [nvarchar](256) NULL,
	[Amount] DECIMAL(18,2) NULL
);

INSERT INTO [dbo].[tbl_Sales] 
([Seq], [EmpName], [Product], [Amount]) 
VALUES (1, N'John', N'Cleaners', 62704.1),
(2, N'John', N'Cleaners', 212.9),
(3, N'Adam', N'Chains', 21814.2),
(4, N'Adam', N'Chains', 33044.65),
(5, N'Michal', N'Caps', 27582.42),
(6, N'Michal', N'Caps', 40892.41),
(7, N'Smith', N'Brakes', 44787.54),
(8, N'Smith', N'Brakes', 12692.64),
(9, N'David', N'Bottom Brackets', 49318.12),
(10, N'David', N'Bottom Brackets', 74022.22);

SELECT * FROM [dbo].[tbl_Sales];

Below is the output of the above SQL queries:

Jupyter Notebooks - Query Output

Jupyter Notebooks – Query Output

The %%sql magic command allows us to write the SQL query in multiple lines. This is very handy if we are working with a complex data analysis task.

Thanks for the reading. Please share your inputs in the comment section.

Rate This
[Total: 2   Average: 5/5]

Gopal Krishna Ranjan

About Gopal Krishna Ranjan

Gopal is a passionate Data Engineer and Data Analyst. He has implemented many end to end solutions using Big Data, Machine Learning, OLAP, OLTP, and cloud technologies. He loves to share his experience at https://www.sqlrelease.com/. Connect with Gopal on LinkedIn at https://www.linkedin.com/in/ergkranjan/.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.