e-Statistics > 4480-5480 Probability and Statistics II

## Central Limit Theorem

Central limit theorem by simulation . We can generate be a sequence of independent uniform random variables on . Since and , we should be able to see that

is distributed approximately as the standard normal distribution when is "large enough." By simulation we can generate randomly for size = 10000 times in order to see how good the approximation. Here we can try different choices for at NN = c(1,2,5,100).

NN = c(1,2,5,100);
size = 10000;
intval = c(-sqrt(3), sqrt(3));
range = c(-3.4, 3.4);
breaks = seq(range[1], range[2], by=0.2);
x = seq(range[1], range[2], length=100);
y = dnorm(x);
par(mfrow=c(2,2));
for(n in NN){
data = matrix(runif(n*size, intval[1], intval[2]), ncol=n);
sample = apply(data, 1, sum) / sqrt(n);
sample = sample[sample > range[1] & sample < range[2]];
hist(sample, breaks, col='yellow', freq=F, main=paste("Simulated Distribution when n =", n));
lines(x, y, type='l', lwd=2, col='blue');
}


Problem 1. An actual voltage of new a -volt battery has the probability density function

Estimate the probability that the sum of the voltages from new batteries lies between and volts. We can generate the distribution of the sum of the voltage by simulation, and compare it with the normal approximation.

n = 120;
size = 10000;
range = c(170, 190);
data = matrix(runif(n*size, 1.4, 1.6), ncol=n);
sample = apply(data, 1, sum);
par(mfrow=c(2,1));
breaks = seq(range[1], range[2], by=0.25);
hist(sample, breaks, col=2, main="Distribution of Simulated Sum");
mean = 1.5 * n;
sd = sqrt((0.2^2/12) * n);
x = seq(range[1], range[2], length=100);
y = dnorm(x, mean, sd);
plot(x, y, type='l', lwd=1, frame.plot=F, main="Normal Approximation");


Problem 2. The germination time in days of a newly planted seed has the probability density function

If the germination times of different seeds are independent of one another, estimate the probability that the average germination time of seeds is between and days. We can generate the distribution of the average germination time by simulation, and compare it with the result of normal approximation.

rate = 0.3;
n = 2000;
size = 10000;
range = c(3, 3.7);
intval = c(3.1, 3.4);
data = matrix(rexp(n*size, rate), ncol=n);
sample = apply(data, 1, mean);
par(mfrow=c(2,1));
breaks = seq(range[1], range[2], by=0.025);
col = rep(0, length(breaks));
col[breaks >= intval[1] & breaks < intval[2]] = 2;
hist(sample, breaks, col=col, main="Distribution of Simulated Averages");
prop = length(sample[sample >= intval[1] & sample <= intval[2]]) / size;
text(intval[2], 10, prop);
mean = 1/rate;
sd = 1/(rate * sqrt(n));
x = seq(range[1], range[2], length=100);
y = dnorm(x, mean, sd);
plot(x, y, type='l', lwd=1, frame.plot=F, main="Normal Approximation");
x = seq(max(c(range[1],intval[1])),
min(c(range[2],intval[2])), length = 50);
y = dnorm(x, mean, sd);
polygon(c(x,max(x),min(x)), c(y,0,0), col=2);
prob = pnorm(intval[2], mean, sd) - pnorm(intval[1], mean, sd);
text(intval[2], 0.1, round(prob,digits=4));