data science – step by step

Hyperparameter tuning using GridSearchCV and RandomizedSearchCV in Python

In the previous post, we had a brief discussion about the GridSearchCV and RandomizedSearchCV. Now, in this post, we will demonstrate that how we can use the GridSearchCV and RandomizedSearchCV methods available with the Sci-kit learn library for hyperparameter tuning in Python. We will use the sklearn built-in diabetes dataset in this demo. However, if […]

Hyperparameter tuning using GridSearchCV and RandomizedSearchCV in Python Read More »

An introduction to GridSearchCV and RandomizedSearchCV

In the previous post, we discussed that how we can assess the performance of a Machine learning model using a k-fold cross-validation method. In this post, we will discuss that how we can leverage the GridSearchCV and RandomizedSearchCV methods to find the optimal hyperparameter values. The hyperparameter value is the value that is required before

An introduction to GridSearchCV and RandomizedSearchCV Read More »

Introduction to k-fold Cross-Validation in Python

This post briefs how we can use the k-fold cross-validation to evaluate a Machine Learning model performance using the Scikit-learn library in Python. We know that the performance of a Machine Learning model depends on the training dataset. Also, if the training dataset has a peculiarity, the model created with that dataset will not work

Introduction to k-fold Cross-Validation in Python Read More »

Create pair plots using scatter_matrix method in pandas

The exploratory data analysis is a very important step in a Data Science project. It helps us to visualize the data and identify any hidden trends that might not be visible with summary statistics alone. So, we can use matplotlib and seaborn libraries to create stunning visuals in Python. However, the pandas.plotting module of the

Create pair plots using scatter_matrix method in pandas Read More »

Plot ECDF in Python

We know that EDA (Exploratory Data Analysis), is the process of organizing, plotting, and summarizing the data to find trends, patterns, and outliers using statistical and visual methods. Here, we have already discussed various methods of performing EDA with their pros and cons on an underlying dataset. ECDF plot is another visual method of performing

Plot ECDF in Python Read More »

Building Decision Tree model in python from scratch – Step by step

In previous post, we created our first Machine Learning model using Logistic Regression to solve a classification problem. We used “Wisconsin Breast Cancer dataset” for demonstration purpose. Now, in this post “Building Decision Tree model in python from scratch – Step by step”, we will be using IRIS dataset which is a standard dataset that

Building Decision Tree model in python from scratch – Step by step Read More »

Building first Machine Learning model using Logistic Regression in Python – Step by Step

This post briefs how to create our first machine learning predictive model using Logistic regression in Python. When we start working on a Machine Learning project, first, we perform some data wrangling and transformation to get the tidy dataset. Then, we perform some EDA to find trends, patterns, and outliers in the given dataset. Once, we have machine-interpretable data

Building first Machine Learning model using Logistic Regression in Python – Step by Step Read More »

Exploratory Data Analysis (EDA) using Python – Second step in Data Science and Machine Learning

In the previous post, “Tidy Data in Python – First Step in Data Science and Machine Learning”, we discussed the importance of the tidy data and its principles. In a Machine Learning project, once we have a tidy dataset in place, it is always recommended to perform EDA (Exploratory Data Analysis) on the underlying data

Exploratory Data Analysis (EDA) using Python – Second step in Data Science and Machine Learning Read More »

Python use case – Resampling time series data (Upsampling and downsampling) – SQL Server 2017

Resampling time series data in SQL Server using Python’s pandas library In this post, we are going to learn how we can use the power of Python in SQL Server 2017 to resample time series data using Python’s pandas library. Sometimes, we get the sample data (observations) at a different frequency (higher or lower) than

Python use case – Resampling time series data (Upsampling and downsampling) – SQL Server 2017 Read More »

What is Machine learning and why is it gaining so much popularity?

Well now a days everyone seems to be talking about machine learning and its applications/uses, but have we ever thought how all of a sudden ML has become so popular? If I tell you that work on AI started way back in 1950 and Machine learning started to grow rapidly in 1990, what has suddenly

What is Machine learning and why is it gaining so much popularity? Read More »