Relationship between Binomial and Poisson distributions


In this post, we are going to discuss the Relationship between Binomial and Poisson distributions. We know that Poisson distribution is a limit of Binomial distribution for a large n (number of trials) and small p (independent probability for each trial) values. A large number of trials n with very small probability p indicates a rare event in a binomial distribution. Considering this, we will simulate these distributions and then we will create a  CDF (cumulative distributed function) plot of Binomial and Poisson distributions. It will help us to understand the similarity between a Poisson experiment and a rare event Binomial experiment.

In this post, we will not be going into the mathematical details of Binomial and Poisson distributions. However, we will be using NumPy’s random module available in Python to simulate these distributions using a technique called bootstrapping.

Relationship between Binomial and Poisson distributions

Let’s start by understanding the Binomial and Poisson distribution first.

What is a Binomial distribution

The binomial distribution is a discrete probability distribution that has two parameters n and p, where n is the number of independent trials with a boolean output (success/failure or true/false), and p is the probability of success on each trial. It can be thought of as the probability of success or failure in an experiment being repeated n times with a probability of p. For example, suppose we want to know the probability of getting heads if we toss a coin 10 times. Each coin toss or each trial has two outputs; head or tail. In the name binomial, bi indicates two or twice. A binomial experiment has below properties:

  1. We have a fixed number of trials in a binomial experiment.
  2. Each trial has two possible outcomes eg. True/False, Yes/No, Head/Tail.
  3. Each trial is an independent trail. The outcome of one trial does not affect the probability of other trials.
  4. Each trial has the same probability p of success.

Example 1: We flip a coin 10 times (n) with a probability (p) 0.5 of getting head in each trial and we want to know the probability of getting a total of 5 heads. This problem can be solved using a binomial experiment. Here, we have n = 10 and p = 0.5.

Example 2: A die is tossed 6 times. What will be the probability of getting 3 sixes? This problem can also be solved using a binomial experiment. Here, we have n = 6 and p = 1/6.

We can use numpy.random.binomial method to create a binomial distribution for a given value of n and p parameters.

What is a Poisson distribution

A Poisson distribution is a discrete probability distribution that has only one Lamba (λ) parameter, where λ is the average number of events which gets occurred in a fixed interval of time or space. It is also called as uni-parametric distribution because it is parameterized with only one variable λ (mean), which is called as the rate parameter. Most of the real-world events follow the Poisson distribution. For example, emails you received each hour, the number of births in a hospital on a given day, the number of hits on a website in an hour, and etc.

Below are the properties of Poisson distribution:

  1. Each event is an independent event and it does not affect the probability of other events.
  2. An event can occur k number of times in a fixed interval of time where k can be 0, 1, 2, 3, 4, ……n.
  3. Two events cannot occur exactly at the same instant, i.e. in an extremely small sub-interval region virtually equivalent to zero, only one event can occur.
  4. The average number of occurrences of events is constant.

Example 1: Each hour, we get 5 customers walking into a showroom. What is the probability of getting 2 customers walking into the showroom next hour? This problem can be solved using Poisson distribution.

Example 2: We get 50 hits on our website on a daily basis. What is the probability of getting 100 hits on a day in a year? This problem can be simulated using Poisson distribution.

We can use numpy.random.poisson method to create a Poisson distribution for a given value of λ.

Derive Poisson distribution from a Binomial distribution (considering large n and small p)

We know that Poisson distribution is a limit of Binomial distribution considering a large value of n approaching infinity, and a small value of p approaching zero. As a guideline, we can consider the Poisson approximation of a Binomial distribution when:

  1. np < 10
  2. n >= 20 and p <= 0.5

Then we can calculate Lambda as λ = np. Now, to understand the Relationship between Binomial and Poisson distributions let’s simulate a story.

Probability distribution story to simulate

Suppose, the probability of getting a hit on a webpage every minute is 0.1 and we are doing this trial for an hour. This is a Bernoulli experiment because it has two outputs Hit/No Hit. In this binomial experiment, we can expect about 6 hits in an hour.

Simulate a Poisson distribution:

In a Poisson experiment, we can consider the value of lambda as 6 on the above example. Considering a sample size of 10000, this statement can be simulated with numpy.random.poisson function as below:

import numpy as np
np.random.poisson(lam = 6, size = 10000)

Output:

array([6, 8, 5, …, 6, 2, 7])

Simulate a Binomial distribution:

If we have the lambda parameter value for Poisson distribution, and we need to simulate a binomial distribution. We can use n and p values in a way so that n multiplied by p results to lambda which is 6 in the above case. We can use numpy.random.binomial function to simulate it in Python. Also, keep in mind that we need to take n >= 20 and p < 0.5

import numpy as np 
np.random.binomial(n = 60, p = 0.1, size = 10000)

Output:

array([2, 4, 7, …, 6, 5, 7])

Simulate and Compare the CDF (Cumulative distribution function) of Binomial and Poisson distributions

We know that a cumulative distribution function is used to plot all the data points. The values on the x-axis are the quantity of the value which we are measuring and the values on the y-axis are the fraction of data points that have a value smaller than the corresponding x value. Using CDF plots, we can easily compare these distributions for the above parameter values.

Let’s plot a Poisson distribution for a sample size of 10000 considering λ = 6. Below is the Python code to generate a Poisson distribution and to plot it using a CDF plot:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
def ecdf(samples):
    x = np.sort(samples)
    y = np.arange(1, len(x) + 1) / len(x)
    return x, y
poissondis = np.random.poisson(lam = 6, size = 10000)
xp, yp = ecdf(poissondis)
_ = plt.plot(xp, yp, marker = '.', linestyle = 'none', color = 'blue')
_ = plt.margins(0.02)
_ = plt.xlabel('Number of hits as per Poisson')
_ = plt.ylabel('CDF')
_ = plt.title('Poisson Distribution')
plt.show()
CDF of Poisson distribution

CDF of Poisson distribution

Now, plot a Binomial distribution for a sample size of 10000 considering n = 60 and p = 0.1. Below is the Python code to generate this distribution and to plot it using a CDF plot:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
def ecdf(samples):
    x = np.sort(samples)
    y = np.arange(1, len(x) + 1) / len(x)
    return x, y
binomialdis = np.random.binomial(n = 60, p = 0.1, size = 10000)
xb, yb = ecdf(binomialdis)
_ = plt.plot(xb, yb, marker = '.', linestyle = 'none', color = 'red')
_ = plt.margins(0.02)
_ = plt.xlabel('Number of hits as per Binomial')
_ = plt.ylabel('CDF')
_ = plt.title('Binomial Distribution for rare events')
plt.show()
CDF of Binomial distribution

CDF of Binomial distribution

In the above plot, we can see that the CDFs for both the distributions are very similar. Let’s draw the CDF for both of these distributions in a single plot and compare it. Below Python code can be used to generate these distributions and to plot them using a CDF plot:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
def ecdf(samples):
    x = np.sort(samples)
    y = np.arange(1, len(x) + 1) / len(x)
    return x, y

poissondis = np.random.poisson(lam = 6, size = 10000)
binomialdis = np.random.binomial(n = 60, p = 0.1, size = 10000)

xp, yp = ecdf(poissondis)
xb, yb = ecdf(binomialdis)
_ = plt.plot(xp, yp, marker = '.', linestyle = 'none', color = 'blue')
_ = plt.plot(xb, yb, marker = '.', linestyle = 'none', color = 'red', alpha=0.5)
_ = plt.xlabel('Number of hits as per Poisson/Binomial')
_ = plt.ylabel('CDF')
_ = plt.title('Poisson/Binomial Distribution - Combined')
plt.show()
Comparative CDFs of Poisson and Binomial distributions

Comparative CDFs of Poisson and Binomial distributions

Now, we can clearly see that the CDF of Binomial distribution is nicely overlapped over the CDF of Poisson distribution. Using this EDA technique, we have proved a Relationship between Binomial and Poisson distributions which tells that a Poisson distribution is a limited version of a binomial distribution.

Thanks for the reading. Please share your inputs in the comment section.

Rate This
[Total: 0   Average: 0/5]

Gopal Krishna Ranjan

About Gopal Krishna Ranjan

Gopal is a passionate Data Engineer and Data Analyst. He has implemented many end to end solutions using Big Data, Machine Learning, OLAP, OLTP, and cloud technologies. He loves to share his experience at https://www.sqlrelease.com/. Connect with Gopal on LinkedIn at https://www.linkedin.com/in/ergkranjan/.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.