Statistics Assignment: Analyzing Probability and Distributions

Verified

Added on  2021/06/16

|15
|1902
|101
Homework Assignment
AI Summary
This statistics assignment delves into Bayesian analysis and probability distributions using real-world datasets. The solution explores geometric and constant prior distributions, calculating posterior means and standard deviations. It then analyzes bicycle traffic data, applying binomial and beta distributions for modeling. The assignment includes likelihood functions, posterior distributions, and simulation results, with R code provided for analysis and visualization. The student also compares original values with theta values and predicts outcomes from posterior distributions, demonstrating a solid understanding of statistical concepts and their practical application in data analysis and modeling. The assignment concludes with a comparison of the original values and theta values to show that the distribution is a good fit.
Document Page
Running head: STATISTICS
STATISTICS
Name of the student
Name of the university:
Authors note:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1STATISTICS
Table of Contents
Answer 1:.........................................................................................................................................2
Answer 2:.........................................................................................................................................2
Answer 3:.........................................................................................................................................2
References........................................................................................................................................3
Document Page
2STATISTICS
Answer 1:
(a) Given that the problem has geometric prior distribution with mean 100. The prior
distribution is:
P(N)=(1/100)(99/100)N-1
Given that the far number 203 is being observed here. Hence, it can be expected to be
at least 203 cars. Any of the cars has equal probability to be the number of the last
car. The likelihood function can be evaluated here as:
P(y/N) = (1/N), for each N>202,
0 , else where.
The posterior distribution here will be:
P(N/y) α P( N)*P(y/N)
=(1/N)(0.01)(0.99)N-1
α (1/N)(0.99)N
(b) A probability distribution adds up to 1 and the posterior probability has to sum up to
1. Therefore, the normalizing constants here are:
P(X) = ∑(1/N)(0.99)N N=203(1) ∞
Therefore, the posterior mean is:
E[p(N/X)] = ∑N=20310000 N ( 1
N ) ( 99
100 )
N1
p( X)
= ∑N=20310000( 99
100 )
N1
p ( X )
= 279.0885
And, the standard deviation is:
Document Page
3STATISTICS
sd[p(N/X)] = √
(NE [ p ( N
X ) ] ) ( 1
N ) ( 99
100 )
N1
p( X )
= 79.96458
N = 203:10000
p.x. = sum((1/N)*(99/100)*(N-1))
e.n. = sum(((99/100)*(N-1))/p.x)
e.n.
s.d.N = sqrt(sum((N-e.n.)^2*((1/N)*(99/100)*(N-1)/p.x.))
s.d.N
(c) Let the problem has constant prior distribution. The prior distribution is:
P(N)=k.
Given that the far number 203 is being observed here. Hence, it can be expected to be
at least 203 cars. Any of the cars has equal probability to be the number of the last
car. The likelihood function can be evaluated here as:
P(y/N) = (1/N), for each N>202,
1 , else where.
The posterior distribution here will be:
P(N/y) α P( N)*P(y/N)
=(1/N)*k
A probability distribution adds up to 1 and the posterior probability has to sum up to 1.
Therefore, the normalizing constants here are:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4STATISTICS
P(X) = ∑(1/N)*k N=203(1) ∞
Therefore, the posterior mean is:
E[p(N/X)] = ∑N=20310000 N ( k / N )
p( X )
And, the standard deviation is:
sd[p(N/X)] = √ (NE [ p ( N
X ) ]) ( k
N )
p ( X )
N = 203:10000
p.x. = k*sum(1/N)
e.n. = k*sum(1/p.x)
e.n.
s.d.N = k*sqrt(sum((N-e.n.)^2*((1/N)/p.x.))
s.d.Ns
Answer 2:
Given is the that on survey on the number of bicycles and other vehicles recorded in an
neighbourhood area of university of California, Berkeley on the division of two routes, one is for
bike route and the other is non-bike route in an residential area. Given is the observed dataset:
Document Page
5STATISTICS
A probability model is to be applied for this dataset for i=1(1)2. It can be easily said that the
observed number of bicycles in the said area is binomial, that is,
y1,y2,...yi ~ ind Bin(ni,θi),
where, θ is the parameter of the model that resembles the proportion of bike traffic at the jth
location. The prior distribution for the parameters can be said to be beta distribution, that is ,
θi ~ iid Beta(α,β),
(b)
Therefore, the prior distribution here is :
P(θi) = Beta(α,β) α θini(1-θi)ni-θi
(c)
The likelihood function is:
P(yi /θi) α θi(α-1)( 1-θi)(β-1)
P(θi/yi) α P(θi)*P(yi/θi) α (θini(1-θi)ni-θi)*( θi(α-1)( 1-θi)(β-1)) = θi(ni+α-1)(1- θi)ni-θi+β-1
The required simulation result is:
α=0.152, β=0.220.
From popn 1:
P(yi /θ1) α θ1(α-1)( 1-θ1)(β-1)
Document Page
6STATISTICS
P(θi/yi) α P(θi)*P(yi/θi) α (θini(1-θi)ni-θi)*( θi(α-1)( 1-θi)(β-1)) = θi(ni+α-1)(1- θi)ni-θi+β-1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7STATISTICS
=β(ni+α, ni-θi+β)
0 20 40 60 80 100
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Index
l
Fig 1: Plot for observations in non bike route
R codes are:
x=seq(0,1,length=100)
l=dbeta(x,10.152,9.9827)
Document Page
8STATISTICS
l
plot(l)
For popn 2:
P(θ2) α θ2(α-1)( 1-θ2)(β-1) ~ β(α, β)
P(θ2/yi) α β(n2+α, n2-θ2+β)
Simulated graph:
0 20 40 60 80 100
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Index
k
Document Page
9STATISTICS
Fig: Plot for observations in the bike route
R codes are:
x2=seq(0,1,length=100)
k=dbeta(x1,8.152,8.1226)
k
(d)
It can be said from the above plot that with a simulation of values with mean of first dataset
minus mean of second dataset, their would not have been much difference in the plot of it.
Therefore, it would not create much difference.
Answer 3:
The given data set is :
Residencial (with bike route): 16/58, 9/90, 10/48, 13/57, 19/103, 20/57, 18/86, 17/112,
35/273,55/64.
Residencial (without bike route): 12/113, 1/18, 2/14, 4/44, 9/208, 7/67, 9/29, 8/154.
The dataset are the counts of bikes and other vehicles used, observed within a period of an hour
in 10 blocks of a city. The dataset are being observed in some residential area.
(a) Given is the that on survey on the number of bicycles and other vehicles recorded in an
neighbourhood area of university of California, Berkeley on the division of two routes,
one is for bike route and the other is non-bike route in an residential area. Given is the
observed dataset:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
10STATISTICS
A probability model is to be applied for this dataset for i=1(1)2. It can be easily said that
the observed number of bicycles in the said area is binomial, that is,
y1,y2,...yi ~ ind Bin(ni,θi),
where, θ is the parameter of the model that resembles the proportion of bike traffic at the
jth location. The prior distribution for the parameters can be said to be beta distribution,
that is ,
θi ~ iid Beta(α,β), with hyper pior density as: p(α,β) =(α+β)-2
Therefore, the prior distribution here is :
P(θi) = Beta(α,β) = ∏ Γ (α + β)
Γ α Γ β θini(1-θi)ni-θi
And,the hyper prior distribution is:
P(α,β)=(α+β)-(5/2)
The likelihood function is:
P(yi /θi) = ∏Cniyi θi(α-1)( 1-θi)(β-1)
The posterior density is:
P(θi,α,β/yi) α P(θi)*P(yi/θi)*P(α,β) = ∏Cniyi θi(α-1)( 1-θi)(β-1) * ∏ Γ (α + β)
Γ α Γ β θini(1-θi)ni-θi * (α+β)-(5/2)
Therefore,
P(θi,α,β/yi) ~ (α+β)-(5/2) * Γ (α + β)
Γ α Γ β * ∏θi(α+yi-1)( 1-θi)(β+ni-yi-1), that can also be written as:
Document Page
11STATISTICS
P(θi,α,β/yi) ~ β(ni+α, ni-θi+β),
(b) The marginal distribution of α and β are:
P(α ,β /y) = P (θ , α , β / y)
P (θ/ , α , β , y )
The marginal density of the parameter is:
P(θ/α,β,y) = Γ (α +β +¿)
Γ (α + yi) Γ ( β+¿ yi) θi(α+yi-1)( 1-θi)(β+ni-yi-1)
P(α,β/y) α (α+β)-(5/2) * Γ (α + β)
Γ α Γ β Γ ( α+ yi ) Γ (β +¿ yi)
Γ (α +β+¿)
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]