Category : Big Data


Skip header and footer rows in Hive 1

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in Azure HDInsight” which explains Hive QL in detail.

Skip header and footer records in Hive

We can ignore N number of rows from top and bottom from a text file without loading that file in Hive using TBLPROPERTIES clause. The TBLPROPERTIES clause provides various features which can be set as per our need. It can be used in this scenario to handle the files which are being generated with additional header and footer records. Let’s have a look at the below sample file:

Sample text file

Sample text file

Now assume that we … More


Preserve Hive metastore in Azure HDInsight 1

In this blog “Preserve Hive metastore in Azure HDInsight“, we are going to learn how we can preserve the hive metadata while working with the Azure HDInsight services. Microsoft Azure HDInsight is an on-demand managed Open source Big Data analytics service for the enterprises. We can provision clusters as per the demand in few minutes, perform the computations, and then we can shut it down to avoid charges. We pay as per the usage only. You can visit this link to know more about Azure HDInsight.

What is Hive?

Apache Hive is a SQL like Big Data query language which is used as an abstraction for the map reduce jobs. The Hive query seamlessly converts into an equivalent map reduce job without the need to write low-level code. This increases the productivity of a developer to a great extent. If you want to read more about Hive … More


Connecting Python 3 to SQL Server 2017 using pyodbc

In this post “Connecting Python 3 to SQL Server 2017 using pyodbc”, we are going to learn that how we can connect Python 3 to SQL Server 2017 to execute SQL queries. We can change the settings accordingly to connect to other versions of SQL Server also. If you are interested to know more about Python and why you should learn it, visit our post “Why Python and how to use it in SQL Server 2017“.

What is pyodbc?

pyodbc is an open source DB API 2 Python module. It provides a convenient interface to connect a database which accepts an ODBC connection. In order to use pyodbc module, firstly, we need to install it. Click here for more information on pyodbc.

pip install pyodbc module

We can use pip install command to install the pyodbc module in Python 3 on a Windows machine. Before executing the … More


Python use case – Get employees for given skill set – SQL Server 2017

In the previous blog post “Python use case – Map unequal comma separated values from two columns – SQL Server 2017“, we demonstrated a use case example of Python in SQL Server 2017. Now, in this post, “Python use case – Get employees for given skill set – SQL Server 2017”, we are going to explore another use case example of Python in SQL Server 2017.

As Microsoft has integrated Python in SQL Server 2017 for advanced data analytics and machine learning purpose, it can also be used to ease the complex data transformation and analysis which might be tedious and a bit complex while doing the same using T-SQL.

Let’s create the required table with the sample data to demonstrate the use case example.

Python use case – Get employees for given skill set

Assume that, we have a table employee master named as tbl_EmpMaster which has … More


Python use case – Map unequal comma separated values from two columns – SQL Server 2017 2

We know that Microsoft has integrated Python in SQL Server 2017 to enable rich data analytics capabilities within the database itself. Python is one of the most powerful languages which provides lots of built-in libraries for advanced data analytics and transformations. We can use Python for almost everything from website development to robotics and Data Science. In SQL Server 2017, Python can be used primarily for Machine learning purposes but it is not limited to that only. We can also use Python for complex data transformations and analysis which might be a bit tedious and complex while doing the same using T-SQL in SQL Server.

In this post, we will be exploring an use case example of Python for data transformation in SQL Server 2017. If you want to read more about Python and how to use it in SQL Server, you can visit my previous blog post “Why More


Why Python and how to use it in SQL Server 2017 1

Microsoft has integrated Python in SQL Server 2017 which can be used for in-database analysis purpose. In this post, we are going to explore “Why Python and how to use it in SQL Server 2017”, and then we will explore that how we can use it in SQL Server 2017.

Why Python

Python is a general purpose object oriented programming language which can be used to develop applications for a variety of domains. We can use Python for almost everything from desktop and website development, gaming, robotics, scientific and numeric computing to spacecraft control and much more. Python is a high-level programming language which is an interpreted language (execute line by line) instead of compiled language. The Python has gained popularity because of its user friendliness. The developers fall in love with Python because it is easy to learn, but still very powerful. The technology giants like Google, YouTube, Dropbox, … More