Tag : step by step


Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017

Import zipped CSV file without unzipping it in SSIS using SQL Server 2017

SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not available in SSIS, we can simply create it by writing our own piece of code in C# using Script Task or Script Component.

In this post, we are going to explore that how we can read and load a zipped CSV file in SQL Server without unzipping it using SSIS along with SQL Server 2017. Reading a zipped file directly (without unzipping it) will save some time required in order to write the text file on the physical disk and then reading it from there. As of now, we don’t have any built-in component in … More


Handling special characters in Hive (using encoding properties) 1

In case we are reading a text file in a Hive table which contains non-English characters and we are not using the appropriate text encoding, these non-English characters might be loaded as junk symbols (like boxes – �). To get these characters in their original form, we need to use the correct character encoding. In this post “Handling special characters in Hive (using encoding properties)“, we are going to learn that how we can read special characters in Hive using encoding properties available with TBLPROPERTIES clause.

To demonstrate it, we will be using a dummy text file which is in ANSI text encoding format and contains Spanish characters. Also, we will be using Microsoft Azure cloud platform to instantiate an on-demand HDInsight cluster that makes it easy to write Hive queries. We will upload the dummy text file to an Azure Data Lake Storage and then we will … More


Skip header and footer rows in Hive 1

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in Azure HDInsight” which explains Hive QL in detail.

Skip header and footer records in Hive

We can ignore N number of rows from top and bottom from a text file without loading that file in Hive using TBLPROPERTIES clause. The TBLPROPERTIES clause provides various features which can be set as per our need. It can be used in this scenario to handle the files which are being generated with additional header and footer records. Let’s have a look at the below sample file:

Now assume that we are dealing with … More


Preserve Hive metastore in Azure HDInsight 2

In this blog “Preserve Hive metastore in Azure HDInsight“, we are going to learn how we can preserve the hive metadata while working with the Azure HDInsight services. Microsoft Azure HDInsight is an on-demand managed Open source Big Data analytics service for the enterprises. We can provision clusters as per the demand in few minutes, perform the computations, and then we can shut it down to avoid charges. We pay as per the usage only. You can visit this link to know more about Azure HDInsight.

What is Hive?

Apache Hive is a SQL like Big Data query language which is used as an abstraction for the map reduce jobs. The Hive query seamlessly converts into an equivalent map reduce job without the need to write low-level code. This increases the productivity of a developer to a great extent. If you want to read more about Hive … More


Get error column name in Data Flow Task in SSIS

How to get error column name and error description in Data Flow Task in SSIS

During execution of an SSIS package, when a bad row comes in the data flow task, the task gets failed. However, most of the components (source, transformation, and destination) in the data flow task exposes an error output path which can be configured to redirect these invalid records to the error output path and then we can log these redirected bad records using a destination component. Once all these bad records get logged, we can investigate the root cause of the errors, fix them, and then these bad rows can be reintroduced again.

The error output path in the data flow task contains two four byte int columns, ErrorColumn and ErrorCode, which represent the lineage id of the error column and the error code respectively. The values in both these columns are numeric which … More


Full Text Search on files in SQL Server

What is Full Text Search in SQL Server?

Full Text Search in SQL Server enables us to perform complex queries against character based data. Full Text Search supports char, varchar, nchar, nvarchar, text, ntext, image, varbinary and xml data types. We Can store document files in varbinary(max) format with their extensions and enable Full-Text search on it. SQL Server supports many types of document files for Full-Text indexing some of them are .asp, .aspx, .ascx, .c, .doc, .html, .ppt, .txt, .xls and many more. Some other extension like .pdf provides it’s own “Filter Pack” to enable Full-Text index on pdf files in SQL Server.

To enable Full Text Search on one or more columns of the given table or indexed view, we have to create a Full Text Index on it. After creating a full text index, we can perform word searching or phrase searching on a column based on … More


Fill Factor in SQL Server

Do you know what is FillFactor for an index? In this post, we will discuss about Fill Factor in SQL Server. Lets start discussion about FILLFACTOR in SQL Server.

What is FILLFACTOR?

Fill factor in SQL server is used to control the filled space of leaf pages with data. Remaining space is left to accommodate future growth of data inside the page. The default value for fill factor is 0 i.e. packed to 100%. Fill factor has a percentage value which could be anything between 1 and 100 including both. A page in SQL Server is the smallest unit of 8K, which holds the number of rows depending on the size of the row.

Setting Fill Factor

Fill factor can be set either at Server Level or at index level. Just see each in action below;

Setting Fill Factor at Server Level

By default it’s 0 at server level, but … More


Creating primary key without clustered index 3

One of my colleague asked me a question that “Can we create a primary key without a clustered index?”. I answered him that “Yes, of-course!” and also did not forget to share this information to all of you in this post named “Create nonclustered primary key”. I know many of us are well aware of this and have did this many times in our day to day SQL activity. Still for those who don’t know how they can accommodate a primary key with a nonclustered index, i am sharing this post, in which i will show you that how we can create a non-clustered primary key on a table.

Before creating this nonclustered primary key, here i am showing you the default behavior of SQL Server during Primary key creation. In case table does not have any Clustered index created on it, SQL Server creates a PRIMARY KEY on a … More


Open SQL Server Management Studio as different windows user 1

Have you ever required to open SQL Server Management Studio Application (SSMS instance) as different windows user from your local machine? Sometimes we require to “Run SSMS instance as different Windows user” another machine which is not available physically to us. In such type of scenarios we can open our local SSMS instance as different windows user at our local machine. In this article we are going to learn how we can open SSMS instance as different windows user. If you are still not required this till now, may be sometime in future you will be required to do the same.

Now to accomplish this task i.e. to open SSMS as a different windows user, we have below approaches;

  1. Using Run as different user option in windows explorer
  2. Using runas command in command prompt
  3. Using runas command with “.bat” file

We will cover all the above techniques below.

1. Using

More

Tracing deadlock with Extended events

In last two articles, we have learned how to capture deadlock using SQL Server Profiler and Tracing deadlock with Trace Flag 1222 and Trace Flag 1204. Now in this post we are going to learn a very useful method of tracing deadlock with Extended Events. These events are less resource intensive and light weight in comparison of SQL Server Profiler. We use these events to capture interested information to be analyzed now or later from heavy load servers (Like production server).

If we don’t have any clue about deadlock being occurred in midnight at our production server and because of heavy load on server, we can not use SQL Server Profiler their. Extended events are very useful because of very few resource utilization. We can create a session and start it to capture only those events in which we are interested. Extended events are very useful in such types … More