Most of the Data Science / Machine Learning projects follow the Pareto principle where we spend almost 80% of the time in data preparation and remaining 20% in choosing and training the appropriate ML model. Mostly, the datasets we get to create Machine Learning models are messy datasets and cannot be fitted into the model directly. We need to perform some data cleaning steps in order to get a dataset which then can be fitted into the model. We need to make sure that the data we are inputting into the model is a tidy data. Indeed, it is the first step in a Machine Learning / Data Science project. We may need to repeat the data cleaning process many times as we face new challenges and problems while cleaning the data. Data cleaning is one of the most important and time taking process a Data Scientist performs before … More
If we need to import data from an excel file into SQL Server, we can use these methods:
- SQL Server Import Export Wizard
- Create an SSIS package to read excel file and load data into a SQL Server table
- Use T-SQL OPENROWSET query
- Use the read_excel method of Python’s pandas library (Only available in SQL Server 2017 onwards)
In this post “Python use case – Import data from excel to sql server table – SQL Server 2017”, we are going to learn that how we can use the power of Python in SQL Server 2017 to read a given excel file in a SQL table directly. With the integration of Python in SQL Server 2017, we can use the pandas read_excel method to read a given excel file with lots of customizations in SQL Server.
Import zipped CSV file without unzipping it in SSIS using SQL Server 2017
SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not available in SSIS, we can simply create it by writing our own piece of code in C# using Script Task or Script Component.
In this post, we are going to explore that how we can read and load a zipped CSV file in SQL Server without unzipping it using SSIS along with SQL Server 2017. Reading a zipped file directly (without unzipping it) will save some time required in order to write the text file on the physical disk and then reading it from there. As of now, we don’t have any built-in component in … More
In this post, we are going to learn how we can leverage python in SQL server to generate comma separated values.
If we want to combine all values of a single column it is fairly easy as we can use COALESCE function to do that. Here is a reference to the already existing post. But have you ever thought what would happen if we needed a comma separated value in a column along with other columns? In that scenario, this approach would not work.
We can get comma separated values in a column along with other columns using FOR XML PATH query wrapped inside a sub-query, but there also we would need to take care of HTML encoded characters like < and >.
In this post, we are going to learn how we can leverage the power of Python’s pandas module in SQL Server 2017. pandas is an open source Python library providing data frame as data structure similar to the SQL table with the vectorized operation support for high performance. To know more about pandas, you can click here.
Let’s discuss the problem we face while using the SQL UNPIVOT clause especially when we have a large number of columns. We can use UNPIVOT clause in SQL Server to convert the columns as row values and normalize the output result set. To use the UNPIVOT command, we need to specify each column name as a fixed value while writing the T-SQL query. However, this becomes annoying if we need to specify a large number of columns in the UNPIVOT clause. Also, if the column names are not fixed (dynamic in nature), … More
In this post “Connecting Python 3 to SQL Server 2017 using pyodbc”, we are going to learn that how we can connect Python 3 to SQL Server 2017 to execute SQL queries. We can change the settings accordingly to connect to other versions of SQL Server also. If you are interested to know more about Python and why you should learn it, visit our post “Why Python and how to use it in SQL Server 2017“.
What is pyodbc?
pyodbc is an open source DB API 2 Python module. It provides a convenient interface to connect a database which accepts an ODBC connection. In order to use pyodbc module, firstly, we need to install it. Click here for more information on pyodbc.
pip install pyodbc module
We can use pip install command to install the pyodbc module in Python 3 on a Windows machine. Before executing the … More
In the previous blog post “Python use case – Map unequal comma separated values from two columns – SQL Server 2017“, we demonstrated a use case example of Python in SQL Server 2017. Now, in this post, “Python use case – Get employees for given skill set – SQL Server 2017”, we are going to explore another use case example of Python in SQL Server 2017.
As Microsoft has integrated Python in SQL Server 2017 for advanced data analytics and machine learning purpose, it can also be used to ease the complex data transformation and analysis which might be tedious and a bit complex while doing the same using T-SQL.
Let’s create the required table with the sample data to demonstrate the use case example.
Python use case – Get employees for given skill set
Assume that, we have a table employee master named as tbl_EmpMaster which has … More
We know that Microsoft has integrated Python in SQL Server 2017 to enable rich data analytics capabilities within the database itself. Python is one of the most powerful languages which provides lots of built-in libraries for advanced data analytics and transformations. We can use Python for almost everything from website development to robotics and Data Science. In SQL Server 2017, Python can be used primarily for Machine learning purposes but it is not limited to that only. We can also use Python for complex data transformations and analysis which might be a bit tedious and complex while doing the same using T-SQL in SQL Server.
In this post, we will be exploring an use case example of Python for data transformation in SQL Server 2017. If you want to read more about Python and how to use it in SQL Server, you can visit my previous blog post “Why … More
Microsoft has integrated Python in SQL Server 2017 which can be used for in-database analysis purpose. In this post, we are going to explore “Why Python and how to use it in SQL Server 2017”, and then we will explore that how we can use it in SQL Server 2017.
Python is a general purpose object oriented programming language which can be used to develop applications for a variety of domains. We can use Python for almost everything from desktop and website development, gaming, robotics, scientific and numeric computing to spacecraft control and much more. Python is a high-level programming language which is an interpreted language (execute line by line) instead of compiled language. The Python has gained popularity because of its user friendliness. The developers fall in love with Python because it is easy to learn, but still very powerful. The technology giants like Google, YouTube, Dropbox, … More