Yearly Archives: 2019


Python use case – Export SQL table data to excel and CSV files – SQL Server 2017

In this post, we are going to discuss how we can export SQL Server table data to an Excel file or to a CSV file using Python’s pandas library. Prior to SQL Server 2017, we could use one of the below methods to export data from SQL Server to Excel or CSV file:

  1. Create an SSIS package to export the data from SQL Server – This option can be a good choice if we want to reuse the export process again and again. Also, if we want to put moderate/complex transformations during data export, this option can be a better choice.
  2. Use SQL Server Import/export wizard – SQL Server provides in-built data export/import wizard which can be used in case we want to export data with no/minimal transformations.
  3. OS-based copy-paste functionality – We can simply copy the query output from SQL Server using Ctrl + C option and then open
More

SQL Server – Error 1061: The service cannot accept control messages at this time

Sometimes when we try to restart “SQL Server service” we might get an error “Windows could not stop the SQL Server (MSSQLSERVER) service on Local Computer” with error code and description “Error 1061: The service cannot accept control messages at this time“. In this post, “SQL Server – Error 1061: The service cannot accept control messages at this time”, we will discuss the workaround which can help us to fix this issue.

This error occurs when we try to restart the SQL Server service using SSMS object explorer or/and using the services console. Let’s try to restart the SQL Server service using object explorer and services console.

Restarting SQL Server services using Object explorer:

Restarting SQL Server using services console:

Below is the error screenshot:

Fix – Error 1061: The service cannot accept control messages at this time

To fix this issue, we can … More


Read and write data to SQL Server from Spark using pyspark

Apache Spark is a very powerful general-purpose distributed computing framework. It provides a different kind of data abstractions like RDDs, DataFrames, and DataSets on top of the distributed collection of the data. Spark is highly scalable Big data processing engine which can run on a single cluster to thousands of clusters. To follow this exercise, we can install Spark on our local machine and can use Jupyter notebooks to write code in an interactive mode. In this post “Read and write data to SQL Server from Spark using pyspark“, we are going to demonstrate how we can use Apache Spark to read and write data to a SQL Server table.

Read SQL Server table to DataFrame using Spark SQL JDBC connector – pyspark

Spark SQL APIs can read data from any relational data source which supports JDBC driver. We can read the data of a SQL Server table … More


Install Spark on Windows (Local machine) with PySpark – Step by Step

Apache Spark is a general-purpose big data processing engine. It is a very powerful cluster computing framework which can run from a single cluster to thousands of clusters. It can run on clusters managed by Hadoop YARN, Apache Mesos, or by Spark’s standalone cluster manager itself. To read more on Spark Big data processing framework, visit this post “Big Data processing using Apache Spark – Introduction“. Here, in this post, we will learn how we can install Apache Spark on a local Windows Machine in a pseudo-distributed mode (managed by Spark’s standalone cluster manager) and run it using PySpark (Spark’s Python API).

Install Spark on Local Windows Machine

To install Apache Spark on a local Windows machine, we need to follow below steps:

Step 1 – Download and install Java JDK 8

Java JDK 8 is required as a prerequisite for the Apache Spark installation. We … More


Change Jupyter Notebook startup folder on Windows and Mac OS

Once we have installed the Jupyter notebook, we can start it by executing “jupyter notebook” command in the command prompt on a Windows machine or in the terminal on a Mac machine. Jupyter notebook is a very useful web-based application which can be used to write programs in many programming languages like Python, R, Scala, Julia, and etc. The notebooks created in jupyter can be shared easily with other users over email, Git, and DropBox. We can use jupyter notebooks to write code in an interactive mode which can be very handy to re-run individual snippets. It is nicely integrated with Big Data ecosystem and with cloud platforms also.

When we start the jupyter notebook server, it shows the notebooks from the current working directory from which the notebook server is started. That is why the default working directory of a Jupyter notebook server is … More


The RPC server is unavailable – SQL Server 2017 installation error 1

During the installation of SQL Server 2017(Or other versions), we can get an error “The RPC server is unavailable” at the very last step of the installation process while executing the action “DReplayControllerConfigAction_install_postmsi_Cpu64“. “The RPC server unavailable error” might also occur at the “Server Configuration” step during the installation process. However, typically this error occurs when we try to install SQL Server on a Remote/Virtual Machine.

Below is the screenshot of the RPC error you may get during the installation of SQL Server 2017 on a remote machine:

The above error message stops the installation process at the final step and when we click on “Retry” button, it keeps prompting the same error message again and again. If you do not get any appropriate solution, this solution might help you to resolve this RPC error which has occurred due to the domain name issue on the … More


RDD, DataFrame, and DataSet – Introduction to Spark Data Abstraction

Apache Spark is a general purpose distributed computing engine used for Big Data processing – Batch and stream processing. It provides high level APIs like Spark SQL, Spark Streaming, MLib, and GraphX to allow interaction with core functionalities of Apache Spark. Spark also facilitates several core data abstractions on top of the distributed collection of data which are RDDs, DataFrames, and DataSets. In this post, we are going to discuss these core data abstractions available in Apache Spark.

Spark Data Abstraction

The data abstraction in Spark represents a logical data structure to the underlying data distributed on different nodes of the cluster. The data abstraction APIs provides wide range of transformation methods (like map(), filter(), etc) which are used to perform computations in a distributed way. However, in order to execute these transformations, we need to call an action method like show(), collect(), etc.

Let’s have a … More


Big Data processing using Apache Spark – Introduction

What is Spark

Apache spark is an open source general purpose distributed cluster computing framework. It is an unified computing engine for big data processing. Spark is designed for lightning fast cluster computing especially for fast computation. An application can run up to 100 times faster than Hadoop MapReduce using Spark in-memory cluster computing. Also, Spark can run up to 10 times faster than Hadoop MapReduce when running on disk.

Why Spark

We can use Spark for any kind of big data processing ranging from SQL to streaming and machine learning running from a single machine to thousands of servers. It supports widely used programming languages like Python, Java, Scala, and R by exposing a set of high level API libraries. Spark can run on clusters managed by Hadoop YARN, Apache Mesos, or it can run standalone also. It provides many features like fast computational speed, multiple language support, … More


Understanding Map join in Hive

Apache Hive is a big data query language which is used to read, transform and write large datasets in a distributed environment. It has a SQL like syntax which gets translated into a MapReduce job in order to execute on Hadoop clusters. In Hadoop ecosystem, we use Hive for batch processing to extract, transform and load the data into a data warehouse system or in a file system which can be HDFS, Amazon S3, Azure Blob or Azure DataLake. However, Hive is not meant for OLTP tasks as it has high latency. In this post, we are going to learn Map Join which can be used to improve the performance of a hive query. We will also discuss the parameters required in order to enable the Map join along with its limitations.

What is Map join in Hive

Join clause in hive is used to combine records from two tables … More


Python use case – Save each worksheet as a separate excel workbook 1

In this post “Python use case – Save each worksheet as a separate excel workbook“, we are going to learn that how we can create a separate workbook for each worksheet of a given excel file. We will be copying data, values, formatting and all other settings of the sheet in the newly created workbook. Assume that we have an excel workbook having five sheets named “Sheet1“, “Sheet2“, “Sheet3“, “Sheet4“, and “Sheet5” which is as this.

Now, we want to create 5 different excel workbooks (one for each worksheet).

We are going to use Excel application to copy data, values, formatting and other settings of each sheet as a new workbook in a python script. Below is the python code.

To execute the above python script, we can call this .py file using command prompt window as this.… More