Gopal Krishna Ranjan

Gopal is a passionate Data Engineer and Data Analyst. He has implemented many end to end solutions using Big Data, Machine Learning, OLAP, OLTP, and cloud technologies. He loves to share his experience at https://sqlrelease.com//. Connect with Gopal on LinkedIn at https://www.linkedin.com/in/ergkranjan/.

Spark read file with special characters using PySpark

Suppose, we have a CSV file that contains some non-English characters (Spanish, Japanese, and etc.) and we want to read this file into a Spark data frame. If we read this file without using the right character encoding, we will end up with some junk characters (like �) in the data frame. So, the files […]

Spark read file with special characters using PySpark Read More »

Read CSV file with Newline character in PySpark

Apache Spark is a Big Data cluster computing framework that can run on Standalone, Hadoop, Kubernetes, Mesos clusters, or in the cloud. We can read and write data from various data sources using Spark. For example, we can use CSV (comma-separated values), and TSV (tab-separated values) files as an input source to a Spark application.

Read CSV file with Newline character in PySpark Read More »

Sort By, Order By, Distribute By, and Cluster By in Hive

This post will briefly discuss the difference and similarity between Sort By, Order By, Distribute By, and Cluster By in hive queries. This is one of the most important questions being asked in Big data/Hadoop interviews. These Sort By, Order By, Distribute By, and Cluster By clauses are available in the hive query language and

Sort By, Order By, Distribute By, and Cluster By in Hive Read More »

Grant UPDATE and SELECT on specific columns in a table – SQL Server

This post briefs how we can Grant UPDATE and SELECT permissions to specific columns of a table in SQL Server without using a view. So that, this partial vertical access control strategy can help us to manage the permissions directly at the table level. It is always good to set the access permissions at the

Grant UPDATE and SELECT on specific columns in a table – SQL Server Read More »

Get consecutive available seats in a row using SQL query

This post briefs how to get consecutive available seats in a row using SQL query for a multiplex cinema theatre that stores its data into a SQL Server database. In other words, we need to write a query to get n number of available consecutive seats for the multiplex seat booking application. However, for this

Get consecutive available seats in a row using SQL query Read More »

Create pair plots using scatter_matrix method in pandas

The exploratory data analysis is a very important step in a Data Science project. It helps us to visualize the data and identify any hidden trends that might not be visible with summary statistics alone. So, we can use matplotlib and seaborn libraries to create stunning visuals in Python. However, the pandas.plotting module of the

Create pair plots using scatter_matrix method in pandas Read More »

Plot ECDF in Python

We know that EDA (Exploratory Data Analysis), is the process of organizing, plotting, and summarizing the data to find trends, patterns, and outliers using statistical and visual methods. Here, we have already discussed various methods of performing EDA with their pros and cons on an underlying dataset. ECDF plot is another visual method of performing

Plot ECDF in Python Read More »

Interactive Data Analysis with HANA using Jupyter Notebook/Jupyter Lab

We have discussed that how we can use Jupyter Lab/Jupyter Notebook to do Interactive Data Analysis with SQL Server using Jupyter Notebooks. Jupyter Notebook is a very powerful and useful tool for any Data Analyst/Data Scientist. The Jupyter Lab is the next generation tool for the Jupyter Notebooks. It provides an interface where we can

Interactive Data Analysis with HANA using Jupyter Notebook/Jupyter Lab Read More »

ASP.NET Core MVC Entity Framework Web App for CRUD operations

In this post, we will demonstrate how easily we can create a web application with CRUD functionality using ASP.NET Core, MVC, and Entity Framework. ASP.NET core is a part of the .NET Core framework which is an open-source framework for Windows, macOS, and Linux operating systems. It provides a cross-platform development environment for the developers.

ASP.NET Core MVC Entity Framework Web App for CRUD operations Read More »

Access git repository using SSH key in PyCharm on Windows and Mac machine

In this post, we are going to discuss how we can set up git bash, SSH keys, and PyCharam IDE to access a git repository using the command line on a Windows or Mac machine. First, we will set it up on a Windows machine followed by a Mac machine. The setup process is very

Access git repository using SSH key in PyCharm on Windows and Mac machine Read More »