Image- Umang Bhalla

Expected Values of Random Variables

Tanav Bajaj

--

Topics Covered- Expected Values of R.V. , Mean , Variance , Correlation Coefficient.

Prerequisite- Random Variables , Joint Random Variables , Mean , Variance

Here I am going to assume that readers already know basic definition of mean, standard deviation and variance.

The expected value can be thought of as the “average” value attained by the random variable. Assume X is a discrete random variable with range Tₓ and PMF fₓ. The expected value of X which is denoted by E[X] is

E[X] = ∑ t fₓ(t)

E[X] May or not belong to the range of X

E[X] has same units as X

I will be explaining expected values of various distributions in the next article.

Properties of Expected Values

1. Constant and Positive Random Variable

If X is constant c as random variable X such that P(X=c) =1

then E[c]= c

Suppose X takes only non-negative values such that P(X ≥0)= 1

then E[X] ≥0 is always true

2. Expected Value of a function of R.V

Suppose X₁,……Xₙ have joint PMF fₓ₁….ₓₙ

E[g(X₁….Xₙ)] = ∑ t.fᵧ(t) = ∑ g( t₁ ….. tₙ)fₓ₁…ₓₙ(t₁….tₙ)

From this it is observed that to find E[Y] we donot need fᵧ. The joint PMF of X₁….Xₙ can be used directly

3. Linearity of Expected value

E[cX]= c*E[X] here X is a random variable and c is constant

Proof:

E[cX]= ∑ ct*fₓ(t) = c ∑ t*fₓ(t) = c*E[X]

Next we can see

E[X+Y]= E[X]+E[Y] for any 2 random variables X and Y

Proof:

E[X+Y] = ∑ (t₁ + t₂) fₓᵧ(t₁,t₂) = ∑ t₁*fₓᵧ(t₁,t₂) + ∑t₂*fₓᵧ(t₁,t₂)= E[X] + E[Y]

This concludes the result- E[aX+bY] = aE[X] + bE[Y]

Zero Mean Random Variable

A random variable X with E[X]=0 is said to be zero-mean random variable.

Assume a situation when there have been equal number of heads and tails in a fair coin toss then

Heads = 1 , Tails = -1

1*1/2 + 1*1/2 = 0

=> E[X]=0

Variance and Standard Deviation

The variance of random variable X , denoted by Var(X) , is defined as

Var(X) = E[(X-E[X])²]

Standard Deviation , denoted by SD(X) is root of variance

So,

Variance is always non-negative and SD is a real number.

Units of SD(X) is same as that of X.

Also the point to note is “ The more spread in range of X , the more will be value of Var(X).

Properties of Var and SD

  1. Var(aX)= a² Var(X)
  2. SD(aX)= |a| SD(X)
  3. Var(X+a)= Var(X)
  4. SD(X+a)= SD(X)

Alternate Way of Writing variance

Var(X)= E[X²]- E[X]²

Standardised Random Variable

A random variable X is said to be standardised if E[X] = 0, Var(X) =1.

Let X be a random variable. Then, Y =(X − E[X])/SD(X) is a standardised random variable.

Covariance

Suppose X and Y are random variables on the same probability space. The covariance of X and Y , denoted as Cov(X, Y ), is defined as

Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

Properties of Covariance

  1. Cov(X,X)= Var(X)
  2. Cov(X,Y) = E[XY]- E[X]E[Y]
  3. Covariance is symmetric if Cov(X, Y ) = Cov(Y, X)
  4. Covariance is a “linear” quantity.
  • Cov(X, aY + bZ) = aCov(X, Y ) + bCov(X, Z)
  • Cov(aX + bY, Z) = aCov(X, Z) + bCov(Y, Z)

5. Independence: If X and Y are independent, then X and Y are uncorrelated, i.e. Cov(X, Y ) = 0

Correlation Coefficient

The correlation coefficient or correlation of two random variables X and Y, denoted by ρ(X, Y ), is defined as

Correlation coefficient

ρ(X, Y ) summarizes the trend between random variables.

Properties Of Correlation Coefficient

  1. -1 ≤ ρ(X,Y) ≤1
  2. ρ(X, Y ) is a dimensionless quantity.
  3. If ρ(X, Y ) is close to zero, there is no clear linear trend between X and Y . Hence there is no clear trend between X and Y.
  4. If ρ(X, Y ) = 1 or ρ(X, Y ) = −1, Y is a linear function of X. This also shows that if value of ρ is close to 1 X and Y are strongly correlated.

Here Y=aX + b such that a≠0

If you guys remember the formula of markov’s and chebychev’s inequality then you can note that expected value , mean and variance are used.

Mean through Markov’s inequality bounds the probability that a non-negative random variables take much larger than the mean.

And through Chebyshev’s inequality the bounds probability away from the mean by a factor of k*σ.

These are some of the most useful measures of finding expected “center”( mean) and “spread” in practice

--

--

Tanav Bajaj

Caffeine-fueled Prompt Engineer who can say "Hello World!" and train ML models like it's nobody's business!