e-Statistics > 4480-5480 Probability and Statistics II

Central Limit Theorem

Central limit theorem by simulation . We can generate $ X_1,\ldots,X_n$ be a sequence of independent uniform random variables on $ (-\sqrt{3}, \sqrt{3})$. Since $ E[X_i] = 0$ and $ \textrm{Var}(X_i) = 1$, we should be able to see that

$\displaystyle Z_n = \frac{\displaystyle\sum_{i=1}^n X_i}{\sqrt{n}}
$

is distributed approximately as the standard normal distribution when $ n$ is "large enough." By simulation we can generate $ Z_n$ randomly for size = 10000 times in order to see how good the approximation. Here we can try different choices for $ n$ at NN = c(1,2,5,100).

NN = c(1,2,5,100);
size = 10000;
intval = c(-sqrt(3), sqrt(3));
range = c(-3.4, 3.4);
breaks = seq(range[1], range[2], by=0.2);
x = seq(range[1], range[2], length=100);
y = dnorm(x);
par(mfrow=c(2,2));
for(n in NN){
  data = matrix(runif(n*size, intval[1], intval[2]), ncol=n);
  sample = apply(data, 1, sum) / sqrt(n);
  sample = sample[sample > range[1] & sample < range[2]];
  hist(sample, breaks, col='yellow', freq=F, main=paste("Simulated Distribution when n =", n));
  lines(x, y, type='l', lwd=2, col='blue');
}

Problem 1. An actual voltage of new a $ 1.5$-volt battery has the probability density function

$\displaystyle f(x) = 5, \quad 1.4 \le x \le 1.6.
$

Estimate the probability that the sum of the voltages from $ 120$ new batteries lies between $ 170$ and $ 190$ volts. We can generate the distribution of the sum of the voltage by simulation, and compare it with the normal approximation.

n = 120;
size = 10000;
range = c(170, 190);
data = matrix(runif(n*size, 1.4, 1.6), ncol=n);
sample = apply(data, 1, sum);
par(mfrow=c(2,1));
breaks = seq(range[1], range[2], by=0.25);
hist(sample, breaks, col=2, main="Distribution of Simulated Sum");
mean = 1.5 * n;
sd = sqrt((0.2^2/12) * n);
x = seq(range[1], range[2], length=100);
y = dnorm(x, mean, sd);
plot(x, y, type='l', lwd=1, frame.plot=F, main="Normal Approximation");

Problem 2. The germination time in days of a newly planted seed has the probability density function

$\displaystyle f(x) = 0.3 e^{-0.3 x}, \quad x \ge 0.
$

If the germination times of different seeds are independent of one another, estimate the probability that the average germination time of $ 2000$ seeds is between $ 3.1$ and $ 3.4$ days. We can generate the distribution of the average germination time by simulation, and compare it with the result of normal approximation.

rate = 0.3;
n = 2000;
size = 10000;
range = c(3, 3.7);
intval = c(3.1, 3.4);
data = matrix(rexp(n*size, rate), ncol=n);
sample = apply(data, 1, mean);
par(mfrow=c(2,1));
breaks = seq(range[1], range[2], by=0.025);
col = rep(0, length(breaks));
col[breaks >= intval[1] & breaks < intval[2]] = 2;
hist(sample, breaks, col=col, main="Distribution of Simulated Averages");
prop = length(sample[sample >= intval[1] & sample <= intval[2]]) / size;
text(intval[2], 10, prop);
mean = 1/rate;
sd = 1/(rate * sqrt(n));
x = seq(range[1], range[2], length=100);
y = dnorm(x, mean, sd);
plot(x, y, type='l', lwd=1, frame.plot=F, main="Normal Approximation");
x = seq(max(c(range[1],intval[1])),
        min(c(range[2],intval[2])), length = 50);
y = dnorm(x, mean, sd);
polygon(c(x,max(x),min(x)), c(y,0,0), col=2);
prob = pnorm(intval[2], mean, sd) - pnorm(intval[1], mean, sd);
text(intval[2], 0.1, round(prob,digits=4));


© TTU Mathematics