Multivariate Normal Distribution and Generalized Distance Analysis
VerifiedAdded on 2023/06/07
|16
|2867
|477
AI Summary
This article discusses the multivariate normal distribution and its generation using singular value decomposition. It also covers the generalized distance analysis and its application in univariate conditions. The article includes R code for generating random vectors, histograms, q-q plots, and power transformations.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/ce54b1c8-0fcf-4cfd-ba8f-bb928c992ac6-page-1.webp)
Running head: Assignment 1
Assignment 1
Name of the student
Name of the University
Author Note
Assignment 1
Name of the student
Name of the University
Author Note
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/919864a8-a098-4d69-b2d4-36420f4d08b9-page-2.webp)
Assignment 1
Question 1:
a) The pdf of the multivariate normal distribution is given by,
f x ( xp∗1 ) =c∗e
( − ( x−μ ) T
∑
(−1 )
( x−μ )
2 }
Here, μ = E(Xp*1) and ∑ = var(Xp*1) > 0 and c is a constant which is given by
c= 1
( 2∗π )
p
2 ¿ ¿
Here, ¿ ∑∨¿ is the value of the determinant of ∑.
Now, by the method of singular value decomposition ∑ = UD UT = U D (1
2 ) D (1
2 ) UT
Here, U = eigenvector matrix of and
D=diagonal of eigenvalues λ 1 , λ 2 , λ 3 … . λp
Now, in order to generate the random vectors from multivariate n∑ormal distribution
Np(μ,∑) the following methods must be followed.
Step 1: At first the eigenvalues λ 1 , λ 2 , λ 3 , … λp and the corresponding eigenvectors
Uj of ∑ and then D^(0.5) = diag( λ 10.5 , λ 20.5 , λ 30.5 , … λ p0.5 ¿ is needed to be
calculated.
Then the matrix A = U D
1
2 ( or A = UD^(1/2)U^(T)) is formed.
Step 2: Now, the z values z1,z2…zp from N(0,1) is obtained. Now, Let z =
( z 1 , … , zp )T
Step 3: Now, one random vector yp*1 = Az + μ from the multivariate normal
distribution Np(μ,∑) can be obtained.
b) Now, the multivariate normal distribution is chosen as 4-variate normal distribution.
The R code to generate a random vector from 4-variate normal distribution in SVD
method is given below.
Question 1:
a) The pdf of the multivariate normal distribution is given by,
f x ( xp∗1 ) =c∗e
( − ( x−μ ) T
∑
(−1 )
( x−μ )
2 }
Here, μ = E(Xp*1) and ∑ = var(Xp*1) > 0 and c is a constant which is given by
c= 1
( 2∗π )
p
2 ¿ ¿
Here, ¿ ∑∨¿ is the value of the determinant of ∑.
Now, by the method of singular value decomposition ∑ = UD UT = U D (1
2 ) D (1
2 ) UT
Here, U = eigenvector matrix of and
D=diagonal of eigenvalues λ 1 , λ 2 , λ 3 … . λp
Now, in order to generate the random vectors from multivariate n∑ormal distribution
Np(μ,∑) the following methods must be followed.
Step 1: At first the eigenvalues λ 1 , λ 2 , λ 3 , … λp and the corresponding eigenvectors
Uj of ∑ and then D^(0.5) = diag( λ 10.5 , λ 20.5 , λ 30.5 , … λ p0.5 ¿ is needed to be
calculated.
Then the matrix A = U D
1
2 ( or A = UD^(1/2)U^(T)) is formed.
Step 2: Now, the z values z1,z2…zp from N(0,1) is obtained. Now, Let z =
( z 1 , … , zp )T
Step 3: Now, one random vector yp*1 = Az + μ from the multivariate normal
distribution Np(μ,∑) can be obtained.
b) Now, the multivariate normal distribution is chosen as 4-variate normal distribution.
The R code to generate a random vector from 4-variate normal distribution in SVD
method is given below.
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/61345003-92fc-40d8-b631-053506953c50-page-3.webp)
Assignment 1
R-code:
m = c(2,3,5,1) #defining the means of quadravariate normal distribution
s = cbind(c(2,-1.2,-1,1),c(-1.2,4,1.5,1),c(-1,1.5,1,-0.5),c(1,1,-0.5,4)) # defining the
variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(4) # generating 4 independent random numbers from standard normal
distribution.
A%*%z+m # generating one random vector from the quadravariate normal
distribution
Output:
[,1]
[1,] 2.9869710
[2,] 2.2991517
[3,] 5.2024936
[4,] -0.7154101
So, the generated random vector is (2.987, 2.299, 5.202, -0.715)
c) Now, according to question 2 the 3-variate normal distribution is used to generate 100
random vector of size 3*1 from mean μ = (1,-2,2)^T and
Variance = ∑ = ( 1 1 1
1 3 2
1 2 2 ). Here the target distribution is normal distribution and as
multivariate normal distribution is used for random vector generation, hence each
random vector must be a sample from a normal distribution.
R-code:
m = c(2,3,5,1) #defining the means of quadravariate normal distribution
s = cbind(c(2,-1.2,-1,1),c(-1.2,4,1.5,1),c(-1,1.5,1,-0.5),c(1,1,-0.5,4)) # defining the
variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(4) # generating 4 independent random numbers from standard normal
distribution.
A%*%z+m # generating one random vector from the quadravariate normal
distribution
Output:
[,1]
[1,] 2.9869710
[2,] 2.2991517
[3,] 5.2024936
[4,] -0.7154101
So, the generated random vector is (2.987, 2.299, 5.202, -0.715)
c) Now, according to question 2 the 3-variate normal distribution is used to generate 100
random vector of size 3*1 from mean μ = (1,-2,2)^T and
Variance = ∑ = ( 1 1 1
1 3 2
1 2 2 ). Here the target distribution is normal distribution and as
multivariate normal distribution is used for random vector generation, hence each
random vector must be a sample from a normal distribution.
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/f8f061fe-ce35-4432-a41e-a335aeabd34f-page-4.webp)
Assignment 1
R code:
m = c(1,-2,2) #defining the means of quadravariate normal distribution
s = cbind(c(1,1,1),c(1,3,2),c(1,2,2)) # defining the variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(3) # generating 3 independent random numbers from standard normal
distribution.
for (x in c(1:100)) print(A%*%z+m)
Question 2:
a) The linear transformation of the multivariate normal distribution x= (x1,x2,x3)^T
with mean μ = ( 1
−2
2 ) and ∑ = (1 1 1
1 3 2
1 2 2 ) will also follow a multivariate normal
distribution.
Now, μ(x1,x2) = ( 1
−2 ), ∑(x1,x2) = ( 1 1
1 3 )
Now, here the linear combination is z= 2x1 – 3x2.
Or, z = BX
Where, B = (2 -3)
So, the mean of z, E(z) = Bμ = (2 -3) ( 1
−2 ) = 2 + 6 = 8
Variance of z Var(z) = B ∑ ( x 1 , x 2 ) BT = (2 -3)( 1 1
1 3)( 2
−3 ) = (2 -3) ( 2−3
2−9 )
= (2 -3) (−1
−7 ) = -2 +21 = 19.
So, the distribution of 2x1 -3x2 is a normal distribution with mean 8 and variance 19.
R code:
m = c(1,-2,2) #defining the means of quadravariate normal distribution
s = cbind(c(1,1,1),c(1,3,2),c(1,2,2)) # defining the variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(3) # generating 3 independent random numbers from standard normal
distribution.
for (x in c(1:100)) print(A%*%z+m)
Question 2:
a) The linear transformation of the multivariate normal distribution x= (x1,x2,x3)^T
with mean μ = ( 1
−2
2 ) and ∑ = (1 1 1
1 3 2
1 2 2 ) will also follow a multivariate normal
distribution.
Now, μ(x1,x2) = ( 1
−2 ), ∑(x1,x2) = ( 1 1
1 3 )
Now, here the linear combination is z= 2x1 – 3x2.
Or, z = BX
Where, B = (2 -3)
So, the mean of z, E(z) = Bμ = (2 -3) ( 1
−2 ) = 2 + 6 = 8
Variance of z Var(z) = B ∑ ( x 1 , x 2 ) BT = (2 -3)( 1 1
1 3)( 2
−3 ) = (2 -3) ( 2−3
2−9 )
= (2 -3) (−1
−7 ) = -2 +21 = 19.
So, the distribution of 2x1 -3x2 is a normal distribution with mean 8 and variance 19.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/fb2fba90-d659-4a5d-a833-91ffcfeaa985-page-5.webp)
Assignment 1
b) The joint distribution of y1 = x1 –x3 and y2 = x1 – 2x2 + 3x3 should follow the
conditions
i) f(y1,y2) >=0
ii) ∫∫ f ( y 1 , y 2 ) dy 1∗dy 2=1
The pdf of the joint distribution of y1 and y2 are be given by,
P(y1,y2) = 1
2 πσ 1 σ 2 √1− p2 ∗e
−z
2 (1− p2 )
z = ( y 1−μ 1 )2
σ 12 – 2 p ( y 1−μ 1 ) ( y 2−μ 2 )
σ 1 σ 2 + ( y 2−μ 2 )2
σ 22
p = corr(y1,y2) = cov(y1,y2)/σ 1 σ 2
where, σ 1=standard deviation of y 1
σ2 = standard deviation of y2.
c) Now, if a sample size of 60 or 60 random numbers are selected from the mentioned 3-
varaiate normal distribution and then arranged in vector form, then the distribution of
x will be of 20-variate normal distribution with (20*1) mean and (20*20) variance-
covariance matrix. This is because from 3-varaiate normal distribution at a time 3
random numbers can be generated and hence 60 numbers will be generated in 20
simulations.
Question 3:
a) The generalized distance in multivariate analysis is given by the following formula,
d^2 = (x-xbar)^T S^(-1)(x-xbar)
Where, xbar is the sample mean, S is the covariance matrix.
So, for univariate condition the generalized distance will be d = sqrt((x1-x2)^2)
b) The joint distribution of y1 = x1 –x3 and y2 = x1 – 2x2 + 3x3 should follow the
conditions
i) f(y1,y2) >=0
ii) ∫∫ f ( y 1 , y 2 ) dy 1∗dy 2=1
The pdf of the joint distribution of y1 and y2 are be given by,
P(y1,y2) = 1
2 πσ 1 σ 2 √1− p2 ∗e
−z
2 (1− p2 )
z = ( y 1−μ 1 )2
σ 12 – 2 p ( y 1−μ 1 ) ( y 2−μ 2 )
σ 1 σ 2 + ( y 2−μ 2 )2
σ 22
p = corr(y1,y2) = cov(y1,y2)/σ 1 σ 2
where, σ 1=standard deviation of y 1
σ2 = standard deviation of y2.
c) Now, if a sample size of 60 or 60 random numbers are selected from the mentioned 3-
varaiate normal distribution and then arranged in vector form, then the distribution of
x will be of 20-variate normal distribution with (20*1) mean and (20*20) variance-
covariance matrix. This is because from 3-varaiate normal distribution at a time 3
random numbers can be generated and hence 60 numbers will be generated in 20
simulations.
Question 3:
a) The generalized distance in multivariate analysis is given by the following formula,
d^2 = (x-xbar)^T S^(-1)(x-xbar)
Where, xbar is the sample mean, S is the covariance matrix.
So, for univariate condition the generalized distance will be d = sqrt((x1-x2)^2)
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/4da2de48-7c6f-4715-8d22-38d06bc4e78b-page-6.webp)
Assignment 1
The generalized distance for each data point is given in the following table.
x1 x2 d
3 2.3 0.7
5 1.9 3.1
5 1 4
7 0.7 6.3
7 0.3 6.7
7 1 6
8 1.0
5
6.95
9 0.4
5
8.55
10 0.7 9.3
11 0.4 10.6
b) Now, individual histograms are constructed for each dataset x1 and x2 as given
below.
Frequency distribution of x1:
class
rang
e of
x1
Frequenc
y
0 to 3 1
3 to 4 0
4 to 2
The generalized distance for each data point is given in the following table.
x1 x2 d
3 2.3 0.7
5 1.9 3.1
5 1 4
7 0.7 6.3
7 0.3 6.7
7 1 6
8 1.0
5
6.95
9 0.4
5
8.55
10 0.7 9.3
11 0.4 10.6
b) Now, individual histograms are constructed for each dataset x1 and x2 as given
below.
Frequency distribution of x1:
class
rang
e of
x1
Frequenc
y
0 to 3 1
3 to 4 0
4 to 2
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/76311898-821e-43d4-b42b-247f704c6a65-page-7.webp)
Assignment 1
5
5 to 6 0
6 to 7 3
7 to 8 1
8 to 9 1
9 to
10
1
10 to
11
1
0 to
3 3 to
4 4 to
5 5 to
6 6 to
7 7 to
8 8 to
9 9 to
10 10
to
11
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x1
Frequency
class range of x1
Frequency
Frequency distribution of x2:
class
range
of x2
Frequenc
y
0.3 1
0.6 2
5
5 to 6 0
6 to 7 3
7 to 8 1
8 to 9 1
9 to
10
1
10 to
11
1
0 to
3 3 to
4 4 to
5 5 to
6 6 to
7 7 to
8 8 to
9 9 to
10 10
to
11
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x1
Frequency
class range of x1
Frequency
Frequency distribution of x2:
class
range
of x2
Frequenc
y
0.3 1
0.6 2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/959ca06c-644e-41d8-8ada-31f6a385e5f4-page-8.webp)
Assignment 1
0.9 2
1.2 3
1.5 0
1.8 0
2.1 1
2.4 1
0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x2
Frequency
class range of x2
Frequency
So, from the above two histograms it is quite clear that the two variables x1 and x2 are
approximately normally distributed. The normal distribution will be much more evident when
the number of class intervals are increased or the class width becomes smaller.
c) The normal q-q plot of the distance data from part (a) is created from the r code given
below.
R code fo q-q plot:
d = c(0.7,3.1,4,6.3,6.7,6,6.95,8.55,9.3,10.6)
qqnorm(d)
qqline(d)
0.9 2
1.2 3
1.5 0
1.8 0
2.1 1
2.4 1
0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x2
Frequency
class range of x2
Frequency
So, from the above two histograms it is quite clear that the two variables x1 and x2 are
approximately normally distributed. The normal distribution will be much more evident when
the number of class intervals are increased or the class width becomes smaller.
c) The normal q-q plot of the distance data from part (a) is created from the r code given
below.
R code fo q-q plot:
d = c(0.7,3.1,4,6.3,6.7,6,6.95,8.55,9.3,10.6)
qqnorm(d)
qqline(d)
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/b06284a2-f26c-40f0-af22-8be16c23844c-page-9.webp)
Assignment 1
Now, as most of the data points in q-q plot do not fall into the normal line hence the
datasets x1 and x2 are not jointly normally distributed.
d) The power transformation of data is transforming the data by some function and here
the box-cox transformation is used to transform the variables into normally distributed
variable is applied.
Box-cox transformation of x1 R code:
x1 = c(3,5,5,7,7,7,8,9,10,11)
library(MASS)
Box = boxcox(x1 ~ 1, # box-cox transformation to x1
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1
Now, as most of the data points in q-q plot do not fall into the normal line hence the
datasets x1 and x2 are not jointly normally distributed.
d) The power transformation of data is transforming the data by some function and here
the box-cox transformation is used to transform the variables into normally distributed
variable is applied.
Box-cox transformation of x1 R code:
x1 = c(3,5,5,7,7,7,8,9,10,11)
library(MASS)
Box = boxcox(x1 ~ 1, # box-cox transformation to x1
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/f1e88e01-20bd-4b02-9f0d-4967b68c11db-page-10.webp)
Assignment 1
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x1 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 2 4 4 6 6 6 7 8 9 10
qqnorm(T_box)
qqline(T_box)
Q-Q plot of transformed x1:
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x1 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 2 4 4 6 6 6 7 8 9 10
qqnorm(T_box)
qqline(T_box)
Q-Q plot of transformed x1:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/9af53761-9c7f-48d1-9c78-f241c4580da5-page-11.webp)
Assignment 1
Box-cox transformation of x2 R code:
x2= c(2.3,1.9,1,0.7,0.3,1,1.05,0.45,0.7,0.4)
library(MASS)
Box = boxcox(x2 ~ 1, # box-cox transformation to x2
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x2 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 0.79916555 0.62168880 0.00000000 -0.36311210 -1.27944873 0.00000000
[7] 0.04867133 -0.83125420 -0.36311210 -0.95958226
qqnorm(T_box)
qqline(T_box)
Box-cox transformation of x2 R code:
x2= c(2.3,1.9,1,0.7,0.3,1,1.05,0.45,0.7,0.4)
library(MASS)
Box = boxcox(x2 ~ 1, # box-cox transformation to x2
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x2 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 0.79916555 0.62168880 0.00000000 -0.36311210 -1.27944873 0.00000000
[7] 0.04867133 -0.83125420 -0.36311210 -0.95958226
qqnorm(T_box)
qqline(T_box)
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/324d678c-ac33-4be1-a3e2-7dd665f54539-page-12.webp)
Assignment 1
Q-Q plot of transformed x2:
So, from the above Q-Q plots it is visible that after power transformation (or specifically box-
cox transformation) the transformed dataset x1 is marginally normally distributed as most of
the points lie on the normal line. However, the transformed dataset of x2 is not marginally
normally distributed as most of the points do not lie on the normal line of its Q-Q plot.
Question 4:
a) The results for chromium(x1) and strontium(x2) are pulled out from the
“Peruhair.dat” and written in r file.
Now, the 95% confidence interval from normal distribution assuming the population
variance is equal to the sample variance and population means equal to the sample
mean for the population mean vector μ = (μ1,μ2)^T is calculated by the following R
code.
Q-Q plot of transformed x2:
So, from the above Q-Q plots it is visible that after power transformation (or specifically box-
cox transformation) the transformed dataset x1 is marginally normally distributed as most of
the points lie on the normal line. However, the transformed dataset of x2 is not marginally
normally distributed as most of the points do not lie on the normal line of its Q-Q plot.
Question 4:
a) The results for chromium(x1) and strontium(x2) are pulled out from the
“Peruhair.dat” and written in r file.
Now, the 95% confidence interval from normal distribution assuming the population
variance is equal to the sample variance and population means equal to the sample
mean for the population mean vector μ = (μ1,μ2)^T is calculated by the following R
code.
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/9d5bc216-cee2-4953-99ea-9fb1702c27ec-page-13.webp)
Assignment 1
R-code:
> x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
> x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
> norm.interval = function(data, variance = var(data), conf.level = 0.95)
+ {
+ z = qnorm((1 - conf.level)/2, lower.tail = FALSE)
+ xbar = mean(data)
+ sdx = sqrt(variance/length(data))
+ c(xbar - z * sdx, xbar + z * sdx)
+ }
> norm.interval(x1,var(x1))
[1] 0.3650031 1.7838858
> norm.interval(x2,var(x2))
[1] 4.403621 15.345268
Hence, the 95% confidence interval for chromium(x1) data is [0.3650031,
1.7838858].
The 95% confidence interval for strontium(x2) data is [4.403621, 15.345268]
b) Now, the 90% Bonferroni simultaneous confidence intervals for μ1 and μ2 can be
calculated by the following R-code.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
fit = lm(x1~x2)
ci.sim <- function(model, newdata, type = c("B", "S"), alpha = 0.05)
{
R-code:
> x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
> x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
> norm.interval = function(data, variance = var(data), conf.level = 0.95)
+ {
+ z = qnorm((1 - conf.level)/2, lower.tail = FALSE)
+ xbar = mean(data)
+ sdx = sqrt(variance/length(data))
+ c(xbar - z * sdx, xbar + z * sdx)
+ }
> norm.interval(x1,var(x1))
[1] 0.3650031 1.7838858
> norm.interval(x2,var(x2))
[1] 4.403621 15.345268
Hence, the 95% confidence interval for chromium(x1) data is [0.3650031,
1.7838858].
The 95% confidence interval for strontium(x2) data is [4.403621, 15.345268]
b) Now, the 90% Bonferroni simultaneous confidence intervals for μ1 and μ2 can be
calculated by the following R-code.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
fit = lm(x1~x2)
ci.sim <- function(model, newdata, type = c("B", "S"), alpha = 0.05)
{
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/ee5b99d3-417b-41fe-9a39-228e663698aa-page-14.webp)
Assignment 1
g <- nrow(newdata)
CI <- predict(model, newdata, se.fit = TRUE)
M <- ifelse(match.arg(type) == "B",
qt(1 - alpha / (2*g), model$df),
sqrt( g * qf( 1 - alpha, g, model$df)))
spred <- sqrt(CI$residual.scale^2 + (CI$se.fit)^2 )
x <- data.frame(
"x" = newdata,
"spred" = spred,
"fit" = CI$fit,
"lower" = CI$fit - M * spred,
"upper" = CI$fit + M * spred)
return(x)
}
new <- data.frame(x = c(0,21))
ci.sim(fit, new, type = "S")
c) Now, the R code for 90% T^2 simultaneous confidence intervals for μ1 and μ2 can be
calculated as follows.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
xbar = mean(x1)
xbar
xvar = var(x1)
g <- nrow(newdata)
CI <- predict(model, newdata, se.fit = TRUE)
M <- ifelse(match.arg(type) == "B",
qt(1 - alpha / (2*g), model$df),
sqrt( g * qf( 1 - alpha, g, model$df)))
spred <- sqrt(CI$residual.scale^2 + (CI$se.fit)^2 )
x <- data.frame(
"x" = newdata,
"spred" = spred,
"fit" = CI$fit,
"lower" = CI$fit - M * spred,
"upper" = CI$fit + M * spred)
return(x)
}
new <- data.frame(x = c(0,21))
ci.sim(fit, new, type = "S")
c) Now, the R code for 90% T^2 simultaneous confidence intervals for μ1 and μ2 can be
calculated as follows.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
xbar = mean(x1)
xbar
xvar = var(x1)
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/bb17a4ce-1f03-4251-9bcd-c583ff16fc8d-page-15.webp)
Assignment 1
xvar
# Apply the contrasts
Cm<-matrix( c(-1, -1, 1, 1, 1, -1, 1, -1, 1, -1, -1, 1), 3, 4, byrow=T)
Cm
Cxbar <- Cm%*%xbar
Cxvar <- Cm%*%xvar%*%t(Cm)
# Computation of Hotelling statistic
p <- nrow(Cxvar)
n <- nrow(x1)
nullmean <- c(0, 0, 0)
d <- Cxbar-nullmean
t2<-n*t(d)%*%solve(Cxvar)%*%d;
t2mod<- (n-p)*t2/(p*(n-1))
pval <- 1- pf(t2mod,p,n-p)
cat("Hotelling T-squared statistic", fill=T)
cat("p-value", fill=T)
# simultaneous confidence intervals
alpha<-0.05
wd<-sqrt(((n[1]+n[2]-2)*p/(n[1]+n[2]-p-1))*qf(1-alpha,p,n[1]+n[2]-p-
1))*sqrt(diag(Sp)*sum(1/n))
Cis<-cbind(d-wd,d+wd)
cat("95% simultaneous confidence interval","\n")
Cis
Now, the similar code can calculate the T^2 confidence interval for the sample
variable strontium(x2).
xvar
# Apply the contrasts
Cm<-matrix( c(-1, -1, 1, 1, 1, -1, 1, -1, 1, -1, -1, 1), 3, 4, byrow=T)
Cm
Cxbar <- Cm%*%xbar
Cxvar <- Cm%*%xvar%*%t(Cm)
# Computation of Hotelling statistic
p <- nrow(Cxvar)
n <- nrow(x1)
nullmean <- c(0, 0, 0)
d <- Cxbar-nullmean
t2<-n*t(d)%*%solve(Cxvar)%*%d;
t2mod<- (n-p)*t2/(p*(n-1))
pval <- 1- pf(t2mod,p,n-p)
cat("Hotelling T-squared statistic", fill=T)
cat("p-value", fill=T)
# simultaneous confidence intervals
alpha<-0.05
wd<-sqrt(((n[1]+n[2]-2)*p/(n[1]+n[2]-p-1))*qf(1-alpha,p,n[1]+n[2]-p-
1))*sqrt(diag(Sp)*sum(1/n))
Cis<-cbind(d-wd,d+wd)
cat("95% simultaneous confidence interval","\n")
Cis
Now, the similar code can calculate the T^2 confidence interval for the sample
variable strontium(x2).
![Document Page](https://desklib.com/media/document/docfile/pages/multivariate-normal-distribution-generalized-distance-analysis/2024/09/07/d01c75a6-8220-49d1-84f8-2816cf9194b9-page-16.webp)
Assignment 1
d) The 95% confidence interval of strontium(x2) suggests that there is a 95% chance that
the population mean lies between [4.403621, 15.345268]. Hence, there is only 5%
chance that the population mean is 20 or it is highly unlikely that the population mean
of strontium level is 20.0.
d) The 95% confidence interval of strontium(x2) suggests that there is a 95% chance that
the population mean lies between [4.403621, 15.345268]. Hence, there is only 5%
chance that the population mean is 20 or it is highly unlikely that the population mean
of strontium level is 20.0.
1 out of 16
![[object Object]](/_next/image/?url=%2F_next%2Fstatic%2Fmedia%2Flogo.6d15ce61.png&w=640&q=75)
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.