Author : Rewat Sharma


Rewat Sharma

About Rewat Sharma

Rewat has 6+ years of experience in Software development. He has worked on Data Science, Database, Data Warehouse, Big Data and cloud technologies. He has implemented various Database, Data Warehouse, Big Data and Cloud Solutions from start to end. He has extensively worked on SQL Server, Python, Hadoop, Hive, Azure, Machine Learning, and MSBI (SSAS and SSIS).

Partitioning and Bucketing in Hive

In this article, we will discuss two important concepts “Partitioning and Bucketing” in Hive. These are used to improve query performance and it is important to understand them so that you can apply them efficiently. So let’s start with Partitioning.

Partitioning in Hive

Partitioning is a technique which is used to enhance query performance in hive. It is done by restructuring data into sub directories. Let us understand this concept with an example.

Suppose we have a large file of 10 GB having geographical data for a customer. Now we want to  extract a record for a particular country and for a particular employeId. In order to do so, It will perform a table scan to read all the rows and then pick only those records that satisfy the given predicate.

Now if we partition that table by country and run the query, it will not scan the … More


What is Machine learning and why is it gaining so much popularity?

Well now a days everyone seems to be talking about machine learning and its applications/uses, but have we ever thought how all of a sudden ML has become so popular? If I tell you that work on AI started way back in 1950 and Machine learning started to grow rapidly in 1990, what has suddenly given a boost to Machine Learning?

In this blog, I will give you answers to these questions but let us first have a look at what machine learning is.

We will start from basics and understand what a Program is.In simple terms,a program is predefined set of rules or instructions. When data is fed to the computer, it processes the data using these rules. That sounds pretty cool, but then came this question of can’t a computer be just fed with the data and it decides rules and give us the answers. This would make … More


Python use case – Convert rows into comma separated values in a column – SQL Server 2017

In this post, we are going to learn how we can leverage python in SQL server to generate comma separated values.

If we want to combine all values of a single column it is fairly easy as we can use COALESCE function to do that. Here is a reference to the already existing post. But have you ever thought what would happen if we needed a comma separated value in a column along with other columns? In that scenario, this approach would not work.

We can get comma separated values in a column along with other columns using FOR XML PATH  query wrapped inside a sub-query, but there also we would need to take care of HTML encoded characters like < and >.

Now, with python’s integration with SQL Server 2017, it can be achieved very easily and efficiently as we do not have to rely on subqueries and … More