Python use case – Dynamic UNPIVOT using pandas – SQL Server 2017

In this post, we are going to learn how we can leverage the power of Python’s pandas module in SQL Server 2017. pandas is an open source Python library providing data frame as data structure similar to the SQL table with the vectorized operation support for high performance. To know more about pandas, you can click here.

Let’s discuss the problem we face while using the SQL UNPIVOT clause especially when we have a large number of columns. We can use UNPIVOT clause in SQL Server to convert the columns as row values and normalize the output result set. To use the UNPIVOT command, we need to specify each column name as a fixed value while writing the T-SQL query. However, this becomes annoying if we need to specify a large number of columns in the UNPIVOT clause. Also, if the column names are not fixed (dynamic in nature), we cannot specify the column names as fixed values while designing the query.

To overcome this issue, we can use the power of dynamic SQL to create a T-SQL query dynamically with all the columns and then we can use EXEC or sp_EXECUTESQL statements to execute it. We already have a blog on “Dynamic PIVOT query in SQL Server“. Creating a dynamic T-SQL query is not that simple and needs extensive testing as the query will be created and executed dynamically where some unseen issues might occur. Also, the maintenance of the dynamic SQL is tedious as compared to the static SQL.

With the integration of Python in SQL Server 2017, Microsoft has opened up possibilities of using Python’s libraries which are very powerful, efficient, and can accomplish the most complex tasks in just a few line of codes. In this blog, we will be using melt method from the pandas library to dynamically unpivot the table data.

We can use the below code to create a dummy table with sample data for demonstration purpose.

IF OBJECT_ID('dbo.tbl_unpivot_demo') IS NOT NULL
	DROP TABLE dbo.tbl_unpivot_demo
GO

--Create dummy table
CREATE TABLE dbo.tbl_unpivot_demo
(
	EmpName NVARCHAR(512),
	[Bib Shorts] float,
	[Bike Racks] float,
	[Bike Stands] float,
	[Bottles and Cages] float,
	[Bottom Brackets] float,
	[Brakes] float,
	[Caps] float,
	[Chains] float,
	[Cleaners] float,
	[Cranksets] float
)
GO

--Insert sample data in dummy table
INSERT INTO dbo.tbl_unpivot_demo
(EmpName, [Bib Shorts], [Bike Racks], [Bike Stands], [Bottles and Cages], [Bottom Brackets], Brakes, Caps, Chains, Cleaners, Cranksets)
VALUES
('John', 98583.63, 50060.56, 27877.11, 44454.45, 48582.3 , 55460.04, 92488.6 , 61331.34, 62704.1 ,   212.9 ),
('Adam', 80061.15, 60420.61,  3208.4 , 69232.6 , 91672.85, 80702.13, 18307.98, 21814.2 , 91096.07, 33044.65),
('Michal', 40932.12, 78804.44, 20560.96, 28836.26, 16754.98,   601.77, 27582.42, 70861.79, 80285.37, 40892.41),
('Smith', 65244.43, 76834.61, 45700.16, 56240.19, 97608.88, 44787.54, 8769.13, 47577.39, 27627.93, 12692.64),
('David', 86557.44, 31916.65, 57052.24, 15513.05, 49318.12, 95908.1, 13421.06, 22238.98, 83941.94, 74022.22)
GO

Let’s have a look at the table data.

Sample Table Data
Sample Table Data

Above, we have a denormalized table which contains the details of the employees with their total sale amount for each product type. To keep it simple, we are using only a subset of columns from the actual table. Now, we need to normalize the above table by transposing the name of the product types into rows with their corresponding values for each employee. The output that we need is this.

Required Output
Required Output

To achieve the above output in SQL Server 2017, we can use the power of pandas melt method. The Python script we are using to transpose the product types (currently placed on the columns) into rows is this.

EXEC sp_execute_external_script
@language = N'Python',
@script =
N'
#Line 1
df = SalesData

#Line 2
import pandas as pd

#Line 3
OutputDataSet = pd.melt(df, id_vars = "EmpName", var_name = "Product Type", value_name = "SaleAmount")
'
,@input_data_1 = N'SELECT * FROM dbo.tbl_unpivot_demo'
,@input_data_1_name = N'SalesData'
WITH RESULT SETS(("EmployeeName" varchar(512), "Product Type" nvarchar(256), "SalesAmount" numeric(18, 2)))

How it is working

We already have a blog which explains the parameters of the system stored procedure “sp_execute_external_script.  Let’s discuss the python code used above in detail.

Python Script – #Line 1

In the first line “df = SalesData“, we have assigned the values of table data to a data frame named df.

Python Script – #Line 2

In the second line, “import pandas as pd“, we are importing the pandas module under the alias pd.

Python Script – #Line 3

The third and the final line of code is “OutputDataSet = pd.melt(df, id_vars = “EmpName”, var_name = “Product Type”, value_name = “SaleAmount”)“.

Here, we are have used the pd.melt method to dynamically transpose the columns as row values. Finally, we have assigned the output of the melt method to a data frame named OutputDataSet.

Quick performance summary

We have compared the performance of the pandas melt method with the static SQL UNPIVOT query and found that the Python script is running faster than the SQL UNPIVOT query in case we have a large number of input rows. This is the performance summary.

Performance summary
Performance summary

Conclusion

The Python script has reduced the line of codes drastically and has also increased the performance if the input number of rows is large. It is easy to understand and maintain.

Thanks for reading. Please share your inputs in the comments.

Rate This
[Total: 1 Average: 5]

1 thought on “Python use case – Dynamic UNPIVOT using pandas – SQL Server 2017”

  1. Pingback: Tidy Data in Python - First Step in Data Science and Machine Learning - SQLRelease

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.