Tag : step by step

Handling special characters in Hive (using encoding properties)

In case we are reading a text file in a Hive table which contains non-English characters and we are not using the appropriate text encoding, these non-English characters might be loaded as junk symbols (like boxes – �). To get these characters in their original form, we need to use the correct character encoding. In this post “Handling special characters in Hive (using encoding properties)“, we are going to learn that how we can read special characters in Hive using encoding properties available with TBLPROPERTIES clause.

To demonstrate it, we will be using a dummy text file which is in ANSI text encoding format and contains Spanish characters. Also, we will be using Microsoft Azure cloud platform to instantiate an on-demand HDInsight cluster that makes it easy to write Hive queries. We will upload the dummy text file to an Azure Data Lake Storage and then we will … More

Skip header and footer rows in Hive 1

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in Azure HDInsight” which explains Hive QL in detail.

Skip header and footer records in Hive

We can ignore N number of rows from top and bottom from a text file without loading that file in Hive using TBLPROPERTIES clause. The TBLPROPERTIES clause provides various features which can be set as per our need. It can be used in this scenario to handle the files which are being generated with additional header and footer records. Let’s have a look at the below sample file:

Sample text file

Sample text file

Now assume that we … More

Preserve Hive metastore in Azure HDInsight 1

In this blog “Preserve Hive metastore in Azure HDInsight“, we are going to learn how we can preserve the hive metadata while working with the Azure HDInsight services. Microsoft Azure HDInsight is an on-demand managed Open source Big Data analytics service for the enterprises. We can provision clusters as per the demand in few minutes, perform the computations, and then we can shut it down to avoid charges. We pay as per the usage only. You can visit this link to know more about Azure HDInsight.

What is Hive?

Apache Hive is a SQL like Big Data query language which is used as an abstraction for the map reduce jobs. The Hive query seamlessly converts into an equivalent map reduce job without the need to write low-level code. This increases the productivity of a developer to a great extent. If you want to read more about Hive … More

Get error column name in Data Flow Task in SSIS

How to get error column name and error description in Data Flow Task in SSIS

During execution of an SSIS package, when a bad row comes in the data flow task, the task gets failed. However, most of the components (source, transformation, and destination) in the data flow task exposes an error output path which can be configured to redirect these invalid records to the error output path and then we can log these redirected bad records using a destination component. Once all these bad records get logged, we can investigate the root cause of the errors, fix them, and then these bad rows can be reintroduced again.

The error output path in the data flow task contains two four byte int columns, ErrorColumn and ErrorCode, which represent the lineage id of the error column and the error code respectively. The values in both these columns are numeric which … More

Full Text Search on files in SQL Server

What is Full Text Search in SQL Server?

Full Text Search in SQL Server enables us to perform complex queries against character based data. Full Text Search supports char, varchar, nchar, nvarchar, text, ntext, image, varbinary and xml data types. We Can store document files in varbinary(max) format with their extensions and enable Full-Text search on it. SQL Server supports many types of document files for Full-Text indexing some of them are .asp, .aspx, .ascx, .c, .doc, .html, .ppt, .txt, .xls and many more. Some other extension like .pdf provides it’s own “Filter Pack” to enable Full-Text index on pdf files in SQL Server.

To enable Full Text Search on one or more columns of the given table or indexed view, we have to create a Full Text Index on it. After creating a full text index, we can perform word searching or phrase searching on a column based on … More

Fill Factor in SQL Server

Do you know what is FillFactor for an index? In this post, we will discuss about Fill Factor in SQL Server. Lets start discussion about FILLFACTOR in SQL Server.


Fill factor in SQL server is used to control the filled space of leaf pages with data. Remaining space is left to accommodate future growth of data inside the page. The default value for fill factor is 0 i.e. packed to 100%. Fill factor has a percentage value which could be anything between 1 and 100 including both. A page in SQL Server is the smallest unit of 8K, which holds the number of rows depending on the size of the row.

Setting Fill Factor

Fill factor can be set either at Server Level or at index level. Just see each in action below;

Setting Fill Factor at Server Level

By default it’s 0 at server level, but … More

Creating primary key without clustered index 3

One of my colleague asked me a question that “Can we create a primary key without a clustered index?”. I answered him that “Yes, of-course!” and also did not forget to share this information to all of you in this post named “Create nonclustered primary key”. I know many of us are well aware of this and have did this many times in our day to day SQL activity. Still for those who don’t know how they can accommodate a primary key with a nonclustered index, i am sharing this post, in which i will show you that how we can create a non-clustered primary key on a table.

Before creating this nonclustered primary key, here i am showing you the default behavior of SQL Server during Primary key creation. In case table does not have any Clustered index created on it, SQL Server creates a PRIMARY KEY on a … More

Open SQL Server Management Studio as different windows user 1

Have you ever required to open SQL Server Management Studio Application (SSMS instance) as different windows user from your local machine? Sometimes we require to “Run SSMS instance as different Windows user” another machine which is not available physically to us. In such type of scenarios we can open our local SSMS instance as different windows user at our local machine. In this article we are going to learn how we can open SSMS instance as different windows user. If you are still not required this till now, may be sometime in future you will be required to do the same.

Now to accomplish this task i.e. to open SSMS as a different windows user, we have below approaches;

  1. Using Run as different user option in windows explorer
  2. Using runas command in command prompt
  3. Using runas command with “.bat” file

We will cover all the above techniques below.

1. Using


Tracing deadlock with Extended events

In last two articles, we have learned how to capture deadlock using SQL Server Profiler and Tracing deadlock with Trace Flag 1222 and Trace Flag 1204. Now in this post we are going to learn a very useful method of tracing deadlock with Extended Events. These events are less resource intensive and light weight in comparison of SQL Server Profiler. We use these events to capture interested information to be analyzed now or later from heavy load servers (Like production server).

If we don’t have any clue about deadlock being occurred in midnight at our production server and because of heavy load on server, we can not use SQL Server Profiler their. Extended events are very useful because of very few resource utilization. We can create a session and start it to capture only those events in which we are interested. Extended events are very useful in such types … More

Tracing deadlock with Trace Flag 1222 and Trace Flag 1204

In previous article, we have already learned that how we can trace a deadlock situation with SQL Server Profiler. In case you don’t read, click here. And to understand locking, blocking and deadlocking in sql server, click here. In this post we will learn how we can use “trace flag 1204 and trace flag 1222” to trace a deadlock.

In this article we are going to learn how we can trace SQL Server deadlock and extract the deadlock information from captured data. We will use DBCC TRACEON command, Trace Flag 1204 and Trace Flag 1222 to capture the details.

What is DBCC (Database Console Commands)?

DBCC statements act as Database Console Commands for SQL Server. We can run it using TSQL programming language. It provides commands related to various categories like Maintenance, Miscellaneous,  Informational and Validation.


As we know that Deadlock in SQL Server is a situation … More