Apache Hive is a big data query language which is used to read, transform and write large datasets in a distributed environment. It has a SQL like syntax which gets translated into a MapReduce job in order to execute on Hadoop clusters. In Hadoop ecosystem, we use Hive for batch processing to extract, transform and load the data into a data warehouse system or in a file system which can be HDFS, Amazon S3, Azure Blob or Azure DataLake. However, Hive is not meant for OLTP tasks as it has high latency. In this post, we are going to learn Map Join which can be used to improve the performance of a hive query. We will also discuss the parameters required in order to enable the Map join along with its limitations.
What is Map join in Hive
Join clause in hive is used to combine records from two tables … More