2018

Building first Machine Learning model using Logistic Regression in Python – Step by Step

This post briefs how to create our first machine learning predictive model using Logistic regression in Python. When we start working on a Machine Learning project, first, we perform some data wrangling and transformation to get the tidy dataset. Then, we perform some EDA to find trends, patterns, and outliers in the given dataset. Once, we have machine-interpretable data […]

Building first Machine Learning model using Logistic Regression in Python – Step by Step Read More »

Exploratory Data Analysis (EDA) using Python – Second step in Data Science and Machine Learning

In the previous post, “Tidy Data in Python – First Step in Data Science and Machine Learning”, we discussed the importance of the tidy data and its principles. In a Machine Learning project, once we have a tidy dataset in place, it is always recommended to perform EDA (Exploratory Data Analysis) on the underlying data

Exploratory Data Analysis (EDA) using Python – Second step in Data Science and Machine Learning Read More »

Partitioning and Bucketing in Hive

In this article, we will discuss two important concepts “Partitioning and Bucketing” in Hive. These are used to improve query performance and it is important to understand them so that you can apply them efficiently. So let’s start with Partitioning. Partitioning in Hive Partitioning is a technique which is used to enhance query performance in

Partitioning and Bucketing in Hive Read More »

Quick guide to Bash commands for Big Data Analysis

In this post “Quick guide to Bash commands for Big Data Analysis”, we are going to explore some basic Bash/Linux commands which are very useful in data analysis. Bash is a command line interpreter for the GNU OS(a UNIX like free OS) which typically runs in a command line window. It accepts the command submitted

Quick guide to Bash commands for Big Data Analysis Read More »

Python use case – Resampling time series data (Upsampling and downsampling) – SQL Server 2017

Resampling time series data in SQL Server using Python’s pandas library In this post, we are going to learn how we can use the power of Python in SQL Server 2017 to resample time series data using Python’s pandas library. Sometimes, we get the sample data (observations) at a different frequency (higher or lower) than

Python use case – Resampling time series data (Upsampling and downsampling) – SQL Server 2017 Read More »

What is Machine learning and why is it gaining so much popularity?

Well now a days everyone seems to be talking about machine learning and its applications/uses, but have we ever thought how all of a sudden ML has become so popular? If I tell you that work on AI started way back in 1950 and Machine learning started to grow rapidly in 1990, what has suddenly

What is Machine learning and why is it gaining so much popularity? Read More »

Tidy Data in Python – First Step in Data Science and Machine Learning

Most of the Data Science / Machine Learning projects follow the Pareto principle where we spend almost 80% of the time in data preparation and remaining 20% in choosing and training the appropriate ML model. Mostly, the datasets we get to create Machine Learning models are messy datasets and cannot be fitted into the model

Tidy Data in Python – First Step in Data Science and Machine Learning Read More »

Python use case – Import data from excel to sql server table – SQL Server 2017

If we need to import data from an excel file into SQL Server, we can use these methods: SQL Server Import Export Wizard Create an SSIS package to read excel file and load data into a SQL Server table Use T-SQL OPENROWSET query Use the read_excel method of Python’s pandas library (Only available in SQL Server 2017

Python use case – Import data from excel to sql server table – SQL Server 2017 Read More »

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017

Import zipped CSV file without unzipping it in SSIS using SQL Server 2017 SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not

Python use case – Import zipped file without unzipping it in SSIS and SQL Server – SQL Server 2017 Read More »