Python use case – Get employees for given skill set – SQL Server 2017

In the previous blog post “Python use case – Map unequal comma separated values from two columns – SQL Server 2017“, we demonstrated a use case example of Python in SQL Server 2017. Now, in this post, “Python use case – Get employees for given skill set – SQL Server 2017”, we are going to explore another use case example of Python in SQL Server 2017.

As Microsoft has integrated Python in SQL Server 2017 for advanced data analytics and machine learning purpose, it can also be used to ease the complex data transformation and analysis which might be tedious and a bit complex while doing the same using T-SQL.

Let’s create the required table with the sample data to demonstrate the use case example.

Python use case – Get employees for given skill set

Assume that, we have a table employee master named as tbl_EmpMaster which has two columns; EmpName, and CommaSepSkills. Here, EmpName column contains the name of the employees and the second column CommaSepSkills contains the comma-separated list of skills of the employees.

Below is the employee master table with sample data:

Employee Master Table
Employee Master Table

We can use below script to create the above table along with sample data.

CREATE TABLE tbl_EmpMaster
(
	EmpName VARCHAR(256) NOT NULL,
	CommaSepSkills NVARCHAR(MAX)
)
GO

--Add some sample data in the table
INSERT INTO tbl_EmpMaster
(EmpName, CommaSepSkills)
VALUES
('John','C#,SQL Server,Python,R,Machine Learning'),
('David','PHP,Java,CSS,Angular'),
('Adam','Big Data,Hadoop,AWS,Python,Data Warehousing'),
('Michal','Azure,Hadoop,Spark,Machine Learning,Python,R,C#,U-SQL'),
('Albert', 'Azure,Python,R,Big Data,Data Lake,Data Warehousing')
GO

As an input, the end user will input a list of skills as comma-separated values and we need to find the employees with matching skill set. The order of input skills and employee’s skill should not be considered while matching the skills. So, if we input “Azure,BigData” as input skills, all the employees who have mentioned Azure and Big Data (in any order) in their skillset should be extracted with their skills in the final output. In case of above sample data, the only employee who has both these skills is “Albert“. Let’s have a look at the input and output data.

Input – Required Skillset – Azure,BigData

Output Data:

Output Data
Output Data

Though, we can achieve this output in a set based way using T-SQL statement. However, in this post, we will be using an in-database Python script to achieve this output. Here, we will be using “set” data type, a Python core data type, to compare and identify the matching skills of the employees with the given list of values.

Python’s set data type is an unordered collection of unique elements which provides many methods to compare the elements of different sets like issubset, issuperset, and etc. To read more about Python’s sets, click here.

Let’s have a look at the Python code which we are using to achieve the above output and then discuss it line by line in detail in “How it is working” section of the post.

EXEC sp_execute_external_script
@language = N'Python',
@script =
N'
#Line 1
tblData = EmpMasterData #Reads the input data in a data frame received from SQL Server

#Line 2
import pandas as pd

#Line 3
RequiredSkillsSet = set(RequiredSkills.split(","))

#Line 4
lst = [[empname, empskills] for empname, empskills in zip(tblData.EmpName, tblData.CommaSepSkills) if RequiredSkillsSet.issubset(set(empskills.split(",")))]

#Line 5
OutputDataSet = pd.DataFrame(lst)
'
,@input_data_1 = N'SELECT EmpName, CommaSepSkills FROM dbo.tbl_EmpMaster'
,@input_data_1_name = N'EmpMasterData'
,@params = N'@RequiredSkills VARCHAR(512)'
,@RequiredSkills = N'Azure,Big Data'
WITH RESULT SETS(("EmpName" varchar(256), "CommaSepSkills" nvarchar(max)))

How it is working

Let’s discuss the steps being performed to achieve the above output. In above T-SQL code, we are using the external stored procedure “sp_execute_external_script” to execute an external Python script. We are passing the values for the below parameters to the stored procedure:

@language – Python

@script – Actual Python script to execute

@input_data_1 – The input data set which is the list of employees with their skills

@input_data_1_name – Custom name of the input data set (if not provided, by default InputDataSet)

@params – List of parameters being supplied to the procedure “sp_execute_external_script”

@RequiredSkills – Value for the parameter @RequiredSkills

Let’s discuss the Python code which we are using in @script parameter in detail.

Python Script – Line 1

tblData = EmpMasterData

In the above line of code, we are reading the input data set to a variable tblData.

Python Script – Line 2

import pandas as pd

In this line, we are importing pandas module with an alias pd. We will use this module to create a data frame and then we will return this data frame in the final output.

Python Script – Line 3

RequiredSkillsSet = set(RequiredSkills.split(","))

In this line, we are reading the parameter value of @RequiredSkills param which is being passed to the procedure “sp_execute_external_script” and then we are splitting the values based on comma with the Python’s split method. Finally, we are creating a set of skills using the Python’s set method from the generated list of values and assigning this to a variable named as RequiredSkillsSet.

Python Script – Line 4

lst = [[empname, empskills] for empname, empskills in zip(tblData.EmpName, tblData.CommaSepSkills) if RequiredSkillsSet.issubset(set(empskills.split(",")))]

In the above line of code, we are using a zip function to combine EmpName and CommaSepSkills columns together using “zip(tblData.EmpName, tblData.CommaSepSkills)“.

Then, for each row, we are assigning the values of both these columns in two different variables using a for loop as “for empname, empskills in zip(tblData.EmpName, tblData.CommaSepSkills)“.

Next, we are using a list comprehension with an if filter to get only those employees whose skillset is a superset of the required skillset, or we can say that the required skillset is a subset of the employee’s skillset using “lst = [[empname, empskills] for empname, empskills in zip(tblData.EmpName, tblData.CommaSepSkills) if RequiredSkillsSet.issubset(set(empskills.split(“,”)))]“.

We have used “RequiredSkillsSet.issubset(set(empskills.split(“,”)))” to find that whether the given required skillset is a subset of the employee’s skillset or not.

Python Script – Line 5

OutputDataSet = pd.DataFrame(lst)

In above line of code, we are creating a data frame from the list of employees and their skillset to return as final output.

Thanks for the reading. Please share your input in the comment section of the post.

Rate This
[Total: 3 Average: 4.7]

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.