step by step Archives - Page 2 of 3

Read and write data to SQL Server from Spark using pyspark

5 Comments / Python, Spark, SQL Server / Gopal Krishna Ranjan / Sep 30, 2019 / big data processing, pyspark, sql tips, step by step

Apache Spark is a very powerful general-purpose distributed computing framework. It provides a different kind of data abstractions like RDDs, DataFrames, and DataSets on top of the distributed collection of the data. Spark is highly scalable Big data processing engine which can run on a single cluster to thousands of clusters. To follow this exercise, […]

Read and write data to SQL Server from Spark using pyspark Read More »

Install Spark on Windows (Local machine) with PySpark – Step by Step

2 Comments / Python, Spark / Gopal Krishna Ranjan / Aug 26, 2019 / pyspark, python, python use case, step by step

Apache Spark is a general-purpose big data processing engine. It is a very powerful cluster computing framework which can run from a single cluster to thousands of clusters. It can run on clusters managed by Hadoop YARN, Apache Mesos, or by Spark’s standalone cluster manager itself. To read more on Spark Big data processing framework,

Install Spark on Windows (Local machine) with PySpark – Step by Step Read More »

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017

Leave a Comment / Python, SQL Server / Gopal Krishna Ranjan / May 31, 2018 / pandas, python, python use case sql, sql server 2017, step by step

Import zipped CSV file without unzipping it in SSIS using SQL Server 2017 SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017 Read More »

Handling special characters in Hive (using encoding properties)

4 Comments / Azure, Hive / Gopal Krishna Ranjan / Jan 8, 2018 / cloud, Hadoop, HDInsight, HiveQL, step by step

In case we are reading a text file in a Hive table which contains non-English characters and we are not using the appropriate text encoding, these non-English characters might be loaded as junk symbols (like boxes – �). To get these characters in their original form, we need to use the correct character encoding. In this

Handling special characters in Hive (using encoding properties) Read More »

Skip header and footer rows in Hive

1 Comment / Azure, Big Data/Cloud, Hive / Gopal Krishna Ranjan / Dec 11, 2017 / cloud, Hadoop, HDInsight, HiveQL, step by step

In this post “Skip header and footer rows in Hive“, we are going to learn that how we can ignore few header and footer records in Hive without loading or reading these records in another table or in a view temporarily. If you want to read more about Hive, visit my post “Preserve Hive metastore in

Skip header and footer rows in Hive Read More »

Preserve Hive metastore in Azure HDInsight

2 Comments / Azure, Hive / Gopal Krishna Ranjan / Nov 27, 2017 / cloud, Hadoop, HDInsight, HiveQL, step by step

In this blog “Preserve Hive metastore in Azure HDInsight“, we are going to learn how we can preserve the hive metadata while working with the Azure HDInsight services. Microsoft Azure HDInsight is an on-demand managed Open source Big Data analytics service for the enterprises. We can provision clusters as per the demand in few minutes,

Preserve Hive metastore in Azure HDInsight Read More »

Get error column name in Data Flow Task in SSIS

2 Comments / SSIS / Gopal Krishna Ranjan / Dec 31, 2016 / data flow task, sql server 2016, ssis 2016, step by step

How to get error column name and error description in Data Flow Task in SSIS During execution of an SSIS package, when a bad row comes in the data flow task, the task gets failed. However, most of the components (source, transformation, and destination) in the data flow task exposes an error output path which can

Get error column name in Data Flow Task in SSIS Read More »

Full Text Search on files in SQL Server

1 Comment / SQL Server / Gopal Krishna Ranjan / Dec 14, 2014 / full text search, step by step

What is Full Text Search in SQL Server? Full Text Search in SQL Server enables us to perform complex queries against character based data. Full Text Search supports char, varchar, nchar, nvarchar, text, ntext, image, varbinary and xml data types. We Can store document files in varbinary(max) format with their extensions and enable Full-Text search

Full Text Search on files in SQL Server Read More »

Fill Factor in SQL Server

Leave a Comment / SQL Server / Gopal Krishna Ranjan / Dec 9, 2014 / Fill Factor, step by step

Do you know what is FillFactor for an index? In this post, we will discuss about Fill Factor in SQL Server. Lets start discussion about FILLFACTOR in SQL Server. What is FILLFACTOR? Fill factor in SQL server is used to control the filled space of leaf pages with data. Remaining space is left to accommodate future

Fill Factor in SQL Server Read More »

Creating primary key without clustered index

3 Comments / SQL Server / Gopal Krishna Ranjan / Dec 4, 2014 / clustered index, index, nonclustered index, step by step

One of my colleague asked me a question that “Can we create a primary key without a clustered index?”. I answered him that “Yes, of-course!” and also did not forget to share this information to all of you in this post named “Create nonclustered primary key”. I know many of us are well aware of

Creating primary key without clustered index Read More »