Statistical Analysis: Population, Samples, and Distribution of Means

Verified

Added on 2022/08/27

AI Summary

This assignment delves into statistical concepts, beginning with the calculation of population mean and variance. It then explores sampling with replacement, generating all possible combinations of samples of size 3 from a population of size 5. For each sample, the sample mean is calculated, and a frequency table is constructed. The assignment uses R software and Excel for data manipulation and analysis. The analysis includes plotting the proportion against frequency to determine the distribution and calculating the mean and variance of the sample means. Furthermore, it empirically demonstrates the central limit theorem and compares the variance of the sample means to the population variance. The assignment concludes with the production of density plots for different distributions using R code and comparing them to understand the central limit theorem. The analysis covers topics such as platykurtic distribution, skewed distribution and normal distribution.

Question 1
a. Population mean N=5, X=21, 22, 23, 24, 25
Mean ¿ 21+22+23+24+ 25
5X
−¿=23 ¿
b. Variance of the population
X (Xj - μ) (X j- μ)2
21 -2 4
22 -1 1
23 0 0
24 1 1
25 2 4
Sum =10
σ 2=∑
j=1
N
¿ ¿ ¿
σ 2=∑
j=1
N
¿ ¿ ¿
c. Taking sample of size n=3 from the population with replacement.
Here we will construct the data on R software and extract it to Excel. 1The possible combination
are attached in the excel file.
d. The sample mean for these samples of size 3 which the X_bar column is calculated using
excel with the formula; “=SUM (A2:C2)/$E$2”. The screen short part of the excel sheet
is showing 4 samples are shown below,
1 The excel data table has been extracted using this R code; “write.table (data,file="Table.csv",row.names = F,sep =
",")
”
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519-
530.
data =expand.grid(c(21:25), c(21:25), c(21:25) )
data
write.table (data,file="Table.csv",row.names = F,sep = ",")

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

e. The frequency based on X_bar is determined after finding the class width and upper
limit of the data.2
Plot proportion against values what is the distribution.
To find the data values of proportion in excel we will use the formula,” = (I3/$F$9)” which is the
frequency data divided by number of data which is 125.The excel screen short is shown below;
f. Plotting the proportion against frequency graph, the graph is continuously distributed,
since the frequency increase uniformly with increase in proportion.
0 10 20 30 40 50 60
0
1/20
1/10
3/20
1/5
1/4
3/10
7/20
2/5
9/20
1/2
f(x) = 0.008 x^1
R² = 1
PROPOTIONS AGAINST FREQUENCY
Proportion
Power (Proportion)
Frequency
Proportion
g. We will calculate the mean of the 125 numbers (values under X_bar) using the formula,
“=AVERAGE (D2:D126)”, hence having the X_bar mean as 23.00 while for questi0n
1(a) is 23.00. The two mean are hence equal.
2 The frequency is determine using the formula; “=FREQUENCY(D2:D126,H3:H7)”

The output for the sample size 3 mean is shown below.
h. The variance of X_bar is 0.6666667 and the variance of question 1(b) is 2; the variance 2
indicates that the data points are very spread out from the mean and from one another
compared to 0.666, which shows that the data is closer to the mean and each other.3
i. The empirically demonstrated theorem in part f, g and h is the central limit theorem with
a normal population of a simple random sample (with replacement) of size n=3 instead of
n=5. With a similar distribution we will compare to previous data; otherwise the variation
among the sample means is smaller with variation of 0.666 compared to 2.00 for
population with sample size n.
Question 2 (10 points)
Variance for each sample of the size 3
a. The sample variance of the 125 samples is equal to population variance of the data; hence
the statistic s2 is an unbiased estimator of σ 2 because E ^s2=E ^σ 2
The above is shown in the table below,
b. MSE[ σ ˆ2] = var[ σ ˆ2]+(Bias[ σ ˆ2])2
var ( x ) =E ( x2 ) −¿
proof
Let x = ^σ −σ substituting into x
var ( ^σ−σ ) =E ¿
 Therefore, var ( ^σ−σ )=v ar ¿
3 The variance of X_bar is found using the formula,” =VAR.P($D$2:$D$126)”output is shown in the output for 1(g)

 E ( ^σ −σ ¿2) = MSE ( ^σ )
E ( ^σ −σ ) ¿2 =¿ = bias2( ^σ )
Hence E ( ^σ −σ ) ¿2 =var ( ^σ −σ )+¿
Hence MSE [σ ˆ2] ¿ var ( ^σ−σ ) +¿
The MSE [σ ˆ2] is 0.000605
Question 3. (10 points)
Produce the density of ^X = X 1+ X 2+ ...+ Xn
n , the questions require different graphs as follows:
a. when n=2 , X Unif [0 , 1]
The R code for the question is;
Sample_4m_uniform=function(x){s=runif(5,min = 0,max = 1)}
Sample_4m_uniform
x_bar=replicate(100000,Sample_4m_uniform())
plot(density(x_bar))
The plot graph is shown in the figure below
b. when n=5 , X Unif [0 , 1]
The R code for the question is;
Sample_4m_uniform=function(x){s=runif(2,min = 0,max = 1)}

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Sample_4m_uniform
x_bar=replicate(100000,Sample_4m_uniform())
plot(density(x_bar))
The plot graph is shown in the figure below
c. when n=5 , X xdf =2
2
The R code for the question is;
Sample_4m_Chisq=function(x){s=rchisq(5,2)}
Sample_4m_Chisq
x_bar=replicate(100000,Sample_4m_Chisq())
plot(density(x_bar))
The plot graph is shown in the figure below;

d. when n=30 , X xdf =2
2
The R code for the question is;
Sample_4m_Chisq=function(x){s=rchisq(30,2)}
Sample_4m_Chisq
x_bar=replicate(100000,Sample_4m_Chisq())
plot(density(x_bar))
The plot graph is shown in the figure below;

e. when n=5 , X xdf =50
2
The R code for the question is;
Sample_4m_Chisq=function(x){s=rchisq(5,50)}
Sample_4m_Chisq
x_bar=replicate(100000,Sample_4m_Chisq())
plot(density(x_bar))
The plot graph is shown in the figure below;
f. Comparing the graphs
For CLT, the large n, ^X = X 1+ X 2+...+ Xn
n , ^X → E ( x ) for n→ ∞ hence ^X → μ which converges to
a normal distribution.
g. i. plot(density(runif(100000,min = 0,max = 1))) also same as Unif [0 , 1]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

This first graph is not skewed instead it is a platykurtic since it has flat top.4
The second and third formulas, are;
plot(density(rchisq(100000,2))) which is for xdf =2
2
plot(density(rchisq(100000,50))) which is for xdf =50
2
The plot graphs are shown in the figure below;
4 Rivest, L. P., & Vandal, N. (2003). Mean squared error estimation for small areas when the small area variances
are estimated. In Proceedings of the International Conference on Recent Advances in Survey Sampling. Laboratory
for Research in Statistics and Probability, Carleton University Ottawa, ON, Canada.
For xdf =2
2
For xdf =50
2

The second graph is positively skewed since it is not distributed on the central value and it is
tailed to the right.
The third graph is a normal distribution since it is distributed on the central value.