step by step

Read and write data to SQL Server from Spark using pyspark

Apache Spark is a very powerful general-purpose distributed computing framework. It provides a different kind of data abstractions like RDDs, DataFrames, and DataSets on top of the distributed collection of the data. Spark is highly scalable Big data processing engine which can run on a single cluster to thousands of clusters. To follow this exercise, […]

Read and write data to SQL Server from Spark using pyspark Read More »

Install Spark on Windows (Local machine) with PySpark – Step by Step

Apache Spark is a general-purpose big data processing engine. It is a very powerful cluster computing framework which can run from a single cluster to thousands of clusters. It can run on clusters managed by Hadoop YARN, Apache Mesos, or by Spark’s standalone cluster manager itself. To read more on Spark Big data processing framework,

Install Spark on Windows (Local machine) with PySpark – Step by Step Read More »

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017

Import zipped CSV file without unzipping it in SSIS using SQL Server 2017 SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017 Read More »

Handling special characters in Hive (using encoding properties)

In case we are reading a text file in a Hive table which contains non-English characters and we are not using the appropriate text encoding, these non-English characters might be loaded as junk symbols (like boxes – �). To get these characters in their original form, we need to use the correct character encoding. In this

Handling special characters in Hive (using encoding properties) Read More »

Skip header and footer rows in Hive

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in

Skip header and footer rows in Hive Read More »

Preserve Hive metastore in Azure HDInsight

In this blog “Preserve Hive metastore in Azure HDInsight“, we are going to learn how we can preserve the hive metadata while working with the Azure HDInsight services. Microsoft Azure HDInsight is an on-demand managed Open source Big Data analytics service for the enterprises. We can provision clusters as per the demand in few minutes,

Preserve Hive metastore in Azure HDInsight Read More »

Get error column name in Data Flow Task in SSIS

How to get error column name and error description in Data Flow Task in SSIS During execution of an SSIS package, when a bad row comes in the data flow task, the task gets failed. However, most of the components (source, transformation, and destination) in the data flow task exposes an error output path which can

Get error column name in Data Flow Task in SSIS Read More »

Full Text Search on files in SQL Server

What is Full Text Search in SQL Server? Full Text Search in SQL Server enables us to perform complex queries against character based data. Full Text Search supports char, varchar, nchar, nvarchar, text, ntext, image, varbinary and xml data types. We Can store document files in varbinary(max) format with their extensions and enable Full-Text search

Full Text Search on files in SQL Server Read More »

Creating primary key without clustered index

One of my colleague asked me a question that “Can we create a primary key without a clustered index?”. I answered him that “Yes, of-course!” and also did not forget to share this information to all of you in this post named “Create nonclustered primary key”. I know many of us are well aware of

Creating primary key without clustered index Read More »