Category : Hive


Handling special characters in Hive (using encoding properties) 1

In case we are reading a text file in a Hive table which contains non-English characters and we are not using the appropriate text encoding, these non-English characters might be loaded as junk symbols (like boxes – �). To get these characters in their original form, we need to use the correct character encoding. In this post “Handling special characters in Hive (using encoding properties)“, we are going to learn that how we can read special characters in Hive using encoding properties available with TBLPROPERTIES clause.

To demonstrate it, we will be using a dummy text file which is in ANSI text encoding format and contains Spanish characters. Also, we will be using Microsoft Azure cloud platform to instantiate an on-demand HDInsight cluster that makes it easy to write Hive queries. We will upload the dummy text file to an Azure Data Lake Storage and then we will … More


Skip header and footer rows in Hive 1

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in Azure HDInsight” which explains Hive QL in detail.

Skip header and footer records in Hive

We can ignore N number of rows from top and bottom from a text file without loading that file in Hive using TBLPROPERTIES clause. The TBLPROPERTIES clause provides various features which can be set as per our need. It can be used in this scenario to handle the files which are being generated with additional header and footer records. Let’s have a look at the below sample file:

Now assume that we are dealing with … More


Preserve Hive metastore in Azure HDInsight 2

In this blog “Preserve Hive metastore in Azure HDInsight“, we are going to learn how we can preserve the hive metadata while working with the Azure HDInsight services. Microsoft Azure HDInsight is an on-demand managed Open source Big Data analytics service for the enterprises. We can provision clusters as per the demand in few minutes, perform the computations, and then we can shut it down to avoid charges. We pay as per the usage only. You can visit this link to know more about Azure HDInsight.

What is Hive?

Apache Hive is a SQL like Big Data query language which is used as an abstraction for the map reduce jobs. The Hive query seamlessly converts into an equivalent map reduce job without the need to write low-level code. This increases the productivity of a developer to a great extent. If you want to read more about Hive … More