Image -Umang Bhalla

Distributions In Statistics

Tanav Bajaj

--

This article covers the important key points of the main distributions of statistics in relation to random variables.

Prerequisites- The first 5 articles I published on the following topics-

1. Uniform Distribution

For Discrete Uniform Distribution

X~Uniform(T)

Here T is a finite set and range of the distribution.

Probability Mass Function (PMF )=>

Fₓ(t)= 1/|T|

Example — Tossing a fair coin or Rolling a fair dice.

Tossing a coin -

X~Uniform({0,1}) 0= Heads , 1 = Tails
Fₓ(0)=1/2

Similarly for a dice -

Fₓ(t)=1/6 , here t ∈ {1,2,3,4,5,6}

For Continuous Uniform Distribution

X~ Uniform[a,b]

The Probability density function is given by

The cumulative distribution function is given by

Note- ∫ PDF.dx is the probability in the range [a,b]

Python Code to show uniform distribution ( Continuous)

Output of the above code

2. Bernoulli Random Variable

This is the simplest kind of Random Variable which can only take 2 values 0 or 1 . Here probability of success ( p) is probability that 1 comes.

X~Bernoulli(p) where 0≼ p ≼ 1

Range = {0,1}

PMF => Fₓ(0)= 1-p and Fₓ(1)=1

Each event of Bernoulli Distribution is called a bernoulli trial.

It is usually written as Bernoulli(p) where p is the probability of success.

X~ Bernoulli(p)

3. Binomial Distribution

When there is more than one Bernoulli trial ( under the assumption each trial is independent) we get a Binomial Distribution. This distribution describes the behavior the outputs of n Bernoulli trials ( probability=p) will have.

X~ Binomial(n,p) here n is a positive integer and 0 ≼ p ≼ 1

Range is {0,1,2,3….n}

Here if you remember from the last article Sum of PMFs of all random variables is 1

∑ₖ ⁿCₖ (p)ᵏ (1-p)ʸ = 1 is also always true .

4. Geometric Distribution

This is the type of distribution where the Bernoulli trials keep on happening until there is one success.

For example- When we toss a coin and we stop when we get heads. This type of situation leads to geometric distribution.

The heads can come on the first toss or it can not come after infinite tosses. The probability of each of these cases happening is defined under the geometric distribution.

Written as X~ Geometric(p) where 0 ≤ p ≤1

Range = {1,2,3…….}

PMF= fₓ(k)= (1-p)ᵏ *p ( This is the case when k’th trial is a success)

When finding PMF for first success we use the formula of Geometric Progression.

Let X ∼ Geometric(p), Y ∼ Geometric(q) where X and Y are independent.
Z = min(X, Y )

Z ∼ Geometric(1 − (1 − p)(1 − q))

Maximum of 2 independent geometric random variables is not geometric.

Memory less property of Geometric(p)

For a geometric distribution to find probability that X is greater than m+n such that X is greater than m is always equal to probability of X being greater than n. This is the memory less property of geometric distribution.

If X ∼ Geometric(p), then

P(X > m + n|X > m) = P(X > n)

5. Negative Binomial Distribution

The random variable is the number of repeated trials , X, that produce a certain number of successes (r). The main difference with normal binomial is that there we look for successes and in negative binomial our focus is failures.

It is called negative binomial because it is reverse of binomial distribution

X~Negative Binomial(r,p)

Here r is a positive integer and p ∈ [0,1]

Range of the distribution is r, r+1….. infinity

Now here is r=1 it becomes the case of geometric distribution. So in a way Geometric distribution is a subset of Negative binomial distribution.

An example of such a situation is-

Probability of selling candies in a house is p and r number of candies need to be sold. We can use the Negative binomial distribution to find if r’th candy is sold at the k’th house.

Now the PMF formula can be used to find the probability of the above case happening.

6. Poisson Distribution

Poisson Distribution is a probability distribution that is used to show how many times an event is likely to occur over a specified period.

For a poisson distribution

X~poisson(λ) , where λ>0

Range = {0,1,2,3,4….}

To evaluate we use eˣ expansion

So where exactly is this distribution used

  • Arrival of visitor to website.
  • Emission of particles under radioactive decay.
  • Meteorite entering the atmosphere

Example-

Radioactive Decay : In 2608 time intervals of 7.5 seconds each , emission of particles is given to you such that

Emission rate = Total particles/2608 = 3.8673

and we need to find the time of next emission.

Since the emission rate is constant we can use it as k and using the above formula of PMF we can find P(number of particles=k)

In case of Website visits or Meteorite entering the arrival of visitor or meteor ( in atmosphere) the arrival rate is independent of the past so that can be used under poisson distribution to find the probability of x number of visits from consumers/meteors in a given time frame.

7. Hypergeometric Distribution

Consider a population of N people where r belong to type 1 and (N-r) belongs to type 2. Here we then select m number of people without replacement. X is the number of people selected who belong to Type 1. In such a case Hypergeometric distribution is used.

X ∼ HyperGeo(N, r, m), where N, r, m: positive integers

8. Normal Distribution

Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean. The graph of this distribution is always in form of a Bell Curve.

When working factors such as Height , Birth Weight , Reading ability or Test scores of a population the normal distribution is seen.

X ∼ Normal[μ, σ²] here μ= mean of distribution and σ² is variance

CDF has no closed form expression.

Standardization of Normal Distribution

X ∼ Normal(μ, σ² ), then

To compute the probabilities of the normal distribution, convert probability computation to that of a standard normal.

Let Xᵢ ∼ Normal(μᵢ, σ²ᵢ) are independent and let

Y = a₁X₁ + a₂X₂ + . . . aₙXₙ

Then Y~Normal( μ , σ²)

where μ = a₁*μ₁ + a₂*μ₂ + . . . aₙ*μₙ and σ² = a²₁*σ²₁ + a²₂*σ²₂ + . . . a²ₙ*σ²ₙ

That is linear combinations of i.i.d. normal distributions is again a normal distribution.

output of the above code

9. Exponential Distribution

The exponential distribution is a continuous distribution that is commonly used to measure the expected time for an event to occur.

The main example for this distribution is the intensity of an Earthquake when it happens. Other situations where this distribution is seen is in life of a car battery or amount of time to a particular event

X ∼ Exp(λ)

output of above code

10. Conditional Distribution

Suppose X is a discrete random variable with range Tₓ and A is an event in the same probability space. Then the conditional PMF of X given A is defined as it’s PMF.

Q(t) = P(X = t|A) this is the mathematical form of the above statement

To find the PMF (Fₓ|ₐ(t))

fₓ|ₐ(t) = P((X=t) ⋂ A) / P(A)

For example- when we have a data of Men and women liking different sports and we wish to find stuff like Basketball is liked by how many men. This becomes sport is basketball where person is male. So, P(A) probability person in population is man and t is people liking Basketball.

Here the range of X|A can be different from Tₓ and it will depend on A

This can also work with more than one random variable. A wide variety of conditioning is possible when there are many random variables.
Suppose X₁, X₂, X₃, X₄ ∼ fₓ₁ₓ₂ₓ₃ₓ₄ and xᵢ ∈ Tₓᵢ , then

The above conditioning can be done in any sequence

The upcoming distributions are a part of descriptive statistics and a part of empirical distribution. These are based on observed scores unlike the above theoretical distributions which are based on logic and maths. In other words Empirical distributions describe sample collected and theoretical distributions describe population data.

Empirical Distributions-

Let X₁, X₂, . . . , Xₙ ∼ X be i.i.d. samples. Let #(Xᵢ = t) denote the number of times t occurs in the samples. The empirical distribution is the discrete distribution with PMF

11. Gamma Distribution

Gamma Distribution is a Continuous Probability Distribution that is widely used in different fields of science to model continuous variables that are always positive and have skewed distributions. It occurs naturally in the processes where the waiting times between events are relevant.

The exponential distribution discussed above is derived from Gamma distribution. λ used there is the rate parameter used in gamma distribution.

  • α > 0 is a shape parameter.
  • β > 0 is a rate parameter.
  • θ = 1/β is a scale parameter.

α and β are calculated using method of moments of samples which I might explain in a future article.

gamma distribution

12. Beta Distribution

The beta distribution is a continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution.

Some examples of this distribution is the Click-Through Rate of your advertisement, the conversion rate of customers actually purchasing on your website.

Because the Beta distribution models a probability, its domain is bounded between 0 and 1.

This is like a binomial distribution that deals with probability of success instead of number of successes

α > 0, β > 0 are the shape parameters.

This is a very flexible distribution that shows a variety of graphs

Source- TowardsDataScience

Compiled Formula List of All Distributions

Discrete Distributions

Continuous Distributions

Sum of Distributions-

  • Sum of n independent Bernoulli(p) trials is Binomial(n, p).
  • Sum of 2 independent Uniform random variables is not Uniform.
  • Sum of independent Binomial(n, p) and Binomial(m, p) is Binomial(n + m, p).
  • Sum of r ,i.i.d. Geometric(p) is Negative-Binomial(r, p).
  • Sum of independent Negative-Binomial(r, p) and Negative-Binomial(s, p) is Negative-Binomial(r + s, p)
  • Sum of n i.i.d. Exp(β) is Gamma(n, β).
  • N ∼ Poisson(λ) and X|N = n ∼ Binomial(n, p), then X ∼ Poisson(λp)

Important Results to remember

  • Square of Normal(0, σ²) is Gamma (1/2 , 1/2σ² )
  • Suppose X, Y ∼ i.i.d. Normal(0, σ²). Then, (X/Y)∼ Cauchy(0, 1).

--

--

Tanav Bajaj

Caffeine-fueled Prompt Engineer who can say "Hello World!" and train ML models like it's nobody's business!