The central limit theorem is a statistical theorem that states that, given a sufficiently large sample size, the distribution of the mean of the sample will be approximately normal, regardless of the shape of the underlying distribution from which the sample is drawn.
In other words, if you take a large enough sample of random observations from any distribution, the mean of those observations will tend to be normally distributed. This is true even if the original distribution is not normal.
The central limit theorem has important implications for statistical inference, as it allows us to use the normal distribution as a model for the sampling distribution of the mean, even when the underlying distribution is not normal. This makes it possible to use statistical methods based on the normal distribution, such as hypothesis testing and confidence intervals, to make inferences about the population mean based on a sample mean.
Here’s an example in Python that demonstrates the central limit theorem using a non-normal distribution:
import numpy as np
import matplotlib.pyplot as plt
# Set the seed to ensure reproducibility
np.random.seed(0)
# Generate 1000 random observations from a non-normal distribution
sample = np.random.exponential(size=1000)
# Take 1000 random samples of size 5 from the original sample and compute the means
sample_means = [np.mean(np.random.choice(sample, size=5)) for _ in range(1000)]
# Plot the distribution of the sample means
plt.hist(sample_means, bins=20)
plt.show()
This code generates 1000 random observations from an exponential distribution, which is a non-normal distribution. It then takes 1000 random samples of size 5 from the original sample and computes the mean of each sample. The distribution of the sample means is plotted using a histogram.
As you can see, the distribution of the sample means is approximately normal, even though the original distribution is not normal. This illustrates the central limit theorem in action.