Gopal Krishna Ranjan

Create jar in IntelliJ IDEA for sbt-based Scala + Spark project

Leave a Comment / Scala, Spark / Gopal Krishna Ranjan / Mar 31, 2022 / big data processing, data analysis, scala, step by step

Just like the Maven build tool, sbt is another tool that can be used to manage the project development lifecycle. It helps us to build, test, and package the Scala and Java-based projects into a .jar file. This jar file can be used as a package in another application/project, or it can be simply used […]

Create jar in IntelliJ IDEA for sbt-based Scala + Spark project Read More »

Create jar in IntelliJ IDEA for Maven-based Scala + Spark project

Leave a Comment / Scala, Spark / Gopal Krishna Ranjan / Feb 28, 2022 / big data processing, data analysis, scala, step by step

In this post, we will learn how we can create a jar in IntelliJ IDEA for a Maven-based Scala + Spark project. We will use the maven build tool to create the jar file from the sample Scala project. We know that the Maven is a project management tool that can be used to manage

Create jar in IntelliJ IDEA for Maven-based Scala + Spark project Read More »

Create scala sbt project using IntelliJ IDEA – Step by step

Leave a Comment / Scala, Spark / Gopal Krishna Ranjan / Jan 31, 2022 / big data processing, data analysis, scala, step by step

In the previous post, we discussed how to set up a maven-based Scala project. Now, in this post, we will learn how we can create an sbt-based Scala project using IntelliJ IDEA IDE. The sbt is an open-source build tool for Scala and Java projects like Maven and Ant. If you need to install IntelliJ

Create scala sbt project using IntelliJ IDEA – Step by step Read More »

Create scala maven project using IntelliJ IDEA – Step by step

Leave a Comment / Scala, Spark / Gopal Krishna Ranjan / Dec 29, 2021 / big data processing, data analysis, scala, step by step

In this post, we will learn how to create a Maven-based Scala project using IntelliJ IDEA from scratch. Spark is an open-source unified general-purpose Big Data Processing Framework that is written in Scala programming language. Apache Spark is a multi-language data processing engine that supports SQL, Java, Python, R, and Scala languages. However, most of

Create scala maven project using IntelliJ IDEA – Step by step Read More »

Get HDFS file location of Hive table records as column

Leave a Comment / Hive, Spark / Gopal Krishna Ranjan / Nov 30, 2021 / big data processing, Hadoop, HiveQL, pyspark, python, scala

In this post, we will learn how we can extract the physical HDFS file location path of the Hive table as a column along with other columns of the table. We will demonstrate this using HiveQL, PySpark, and Scala. We can create the Hive tables as internal or external tables. So, if we create an

Get HDFS file location of Hive table records as column Read More »

Read and write data into Hive table from Spark using PySpark

Leave a Comment / Hive, Spark / Gopal Krishna Ranjan / Oct 31, 2021 / big data processing, Hadoop, HiveQL, pyspark

In this post, we will learn how we can read and write the data to a Hive table from a Spark dataframe. Once we have the Hive table data being read into a dataframe, we can apply Spark transformations on that data. Finally, we can write back the data to the the Hive table. We

Read and write data into Hive table from Spark using PySpark Read More »

Hyperparameter tuning using GridSearchCV and RandomizedSearchCV in Python

Leave a Comment / Data Science, Machine Learning, Python / Gopal Krishna Ranjan / Sep 30, 2021 / data science - step by step, machine learning - step by step, python

In the previous post, we had a brief discussion about the GridSearchCV and RandomizedSearchCV. Now, in this post, we will demonstrate that how we can use the GridSearchCV and RandomizedSearchCV methods available with the Sci-kit learn library for hyperparameter tuning in Python. We will use the sklearn built-in diabetes dataset in this demo. However, if

Hyperparameter tuning using GridSearchCV and RandomizedSearchCV in Python Read More »

An introduction to GridSearchCV and RandomizedSearchCV

Leave a Comment / Data Analysis, Data Science, Machine Learning / Gopal Krishna Ranjan / Aug 31, 2021 / data science - step by step, machine learning - step by step

In the previous post, we discussed that how we can assess the performance of a Machine learning model using a k-fold cross-validation method. In this post, we will discuss that how we can leverage the GridSearchCV and RandomizedSearchCV methods to find the optimal hyperparameter values. The hyperparameter value is the value that is required before

An introduction to GridSearchCV and RandomizedSearchCV Read More »

Introduction to k-fold Cross-Validation in Python

Leave a Comment / Data Analysis, Data Science, Machine Learning, Python / Gopal Krishna Ranjan / Jul 12, 2021 / data analysis, data preprocessing, data science - step by step, machine learning - step by step

This post briefs how we can use the k-fold cross-validation to evaluate a Machine Learning model performance using the Scikit-learn library in Python. We know that the performance of a Machine Learning model depends on the training dataset. Also, if the training dataset has a peculiarity, the model created with that dataset will not work

Introduction to k-fold Cross-Validation in Python Read More »

Get minimum value from multiple columns in SQL Server

2 Comments / SQL Server / Gopal Krishna Ranjan / Jun 27, 2021 / sql tips

This post will discuss how we can extract the minimum value from multiple columns in SQL Server. For example, we have a table that stores the temperature of multiple cities in columns. The temperature data of each city is stored in a separate column. However, we have to select the minimum temperature value throughout all

Get minimum value from multiple columns in SQL Server Read More »