GEOG 2700 Fall 2018 Independent Assessment2

Verified

Added on  2023/06/04

|23
|5995
|132
AI Summary
This assessment covers various statistical concepts such as probability, sampling strategy, confidence interval, and hypothesis testing. It includes examples and equations for each topic. The subject is GEOG 2700, and it was conducted in Fall 2018.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
GEOG 2700
Fall 2018 Independent Assessment2
1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
PART 1
1.01
You have observed that the probability of a vehicle accident occurring at a particular intersection is 1/10.
If you drive through this intersection 10 times per week, what is the probability that you will be involved in
an accident in any given week?
Let p = probability of accident occurring = 1/10, n = drive through this intersection = 10 times per week, and
X = Number of accidents in a week. Hence, required probability of involving in an accident =
P ( X >0 ) =1P ( X=0 ) =
1 ( 10 C0 )( 1
10 )
0
( 9
10 )
10
=1 ( 0 . 9 ) 10=0 . 6513
1.02
You have been hired to determine the probability of a lightning-related fire within a local grassland
preserve that covers 10 km2. You have observed that lightning strikes occur with equal probability
everywhere within your management area, and that the likelihood of a lightning strike in any given 1 k2
area is 1/20. You have also observed that probability of a lightning strike resulting in a fire is 0.25. What is
the? You can assume that any lightning strike will result in a fire.
Let, L = lightning strikes, F = fire will occur in the grassland. Now, P ( L ) = 1
20 =0 .05 and P ( F / L ) =0 . 25 .
Therefore, P ( FL ) =0 .050 . 25=0 . 0125 = probability that a lightning-related fire will occur in any 1 km2.
Total grassland preserve = 10 km2. Hence, n = 10. Now, required probability of lightning-related fire will
occur in the grassland preserve = P ( X >0 ) =1P ( X=0 ) =
1 ( 10 C0 )( 0 . 0125 ) 0( 0 . 9875 ) 10=1 ( 0 .9875 ) 10=0 .1182
1.03
Given a normally distributed random variable X with a mean of 10 and standard deviation of 2, what value
represents the lower 60% quantile of this variable?
Given that X ~ N ( 10 ,22 ) , Hence according to the problem P ( X < x ) =0 . 6
So,
P ( Z< x10
2 )=0 . 6=P ( Z< 0. 25335 ) => x =10+20 . 25335=10 . 506710 . 51
Hence, x = 10.51 represents the lower 60% quantile.
2
Document Page
1.04
Given a t-distributed random variable X with a mean of -2 and a standard deviation of 0.3, what is the
probability that X -1?
Given that X ~ t (2,0 . 32 ) , Hence according to the problem we need to find P ( X1 )
Now,
P ( X1 ) =P ( t1+ 2
0 . 3 )=P ( t3 .33 ) =P ( Z3 . 33 ) =0. 9996
for unknown degrees of freedom
1.05
You have been asked to determine if there is a difference TRU faculty and students in how far they have to
walk from their parking spot to their campus destination. Describe how you would define a sampling
strategy to collect this information.
Stratified random sampling would be used to collect the information separately from students and faculties,
regarding the distance travelled from the parking spot to their respective campus. The average distance
travelled by the two strata subjects will be then tested for equality with proper choice of statistic.
1.06
Why are sample statistics considered to be random variables?
Results in outcomes of sample statistics is defined or described by probability distributions. Again, a statistic
describes a set of values or data where data points come from values of random variables. Hence, sample
statistics are considered to be random variables due to association of probability with observations.
1.07
Define and describe the difference between the sample mean, and the mean of the sampling distribution
of the population mean.
Sampling distribution of the population mean is a distribution of statistics of group of samples with a set of
probability rules explaining the variation of behavior of the samples. For example, sampling distribution of
sample means or sample standard deviations denote the probability distribution of statistic of samples.
1.08
What does the central limit theorem tell us about the sampling distribution of the population mean of a
variable that is binomially distributed?
Central limit theorem describes that for large value of the sample size (n>30), the distribution of binomially
distributed population mean follows a normal distribution with sample value for the mean statistic.
3
Document Page
1.09
State the equations that define the sampling distribution of the population proportion. Make sure to
define each variable that appears in these equations.
Equations defining the requisite distribution will be μ= p
^¿
¿ and
σ ¿ ¿ , where p
^¿
¿ the sample
proportion and “n” is is the sample size. The population proportion is distributed with mean μ and standard
deviation σ .
1.10
You are the manager of a popular restaurant. You have been asked to estimate the proportion of
customers who have food allergies. To do this, you sample 150 customers at random and find that 17 of
them have food allergies. Determine the 95% confidence interval of the proportion of customers at your
restaurant who have food allergies. Write out all relevant equations and state your confidence interval as:
According to the problem, n=150 , p=17
150 =0. 113 . The 95% confidence interval of the proportion of
customers is calculated as CI =p±z0. 025
p ( 1 p )
n =0 . 113±1 . 96 0 .1130 . 887
150 =[0 . 063 , 0. 163 ] . Hence,
0 . 063π0. 163 and it is estimated that proportion of customers with food allergies will be somewhere
between 0.063 and 0.163
1.11
You have been asked to estimate the cost of living of TRU students who live in off-campus apartments. In
a random sample of 300 students, you find that the mean cost of living is$1,200 per month with a standard
deviation of 50. Determine the 90% confidence interval of the cost of living for TRU students. Write out
all relevant equations and state your confidence interval as:
According to the problem, n=150 , x
¿
=1200 , s=50
So the 90% confidence interval was evaluated using the formula
CI =x

±t0. 05 s
n =1200±1 . 6450
150 =[1193 . 28 ,1206 . 72] where t-statistics was calculated for 149
degrees of freedom for the probability of 0.90 within the confidence interval. Hence,
1193. 28μ1206 . 72 estimates the population mean with 90% confidence.
4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1.12
You have been hired to design a field study to determine the wintertime natural gas per household of a
northern community. To ensure that a sufficient gas supply is available, you have been asked to estimate
the quantity of gas required per household to be within ± 20 litres with a confidence of 99%. Assuming
that the household gas usage is normally distributed with a standard deviation of 100 litres, how many
households must you sample to estimate the wintertime household gas supply? Write out all relevant
equations.
Let X denotes the household gas usage with normally distributed probability distribution, where standard
deviation s=100 liters. Now standard error was specified as ±20 liters. Hence, at 99% confidence level the
standard of error is calculated as
± s
n =±20=> n=25
. So, 25 households must be taken as the sample to
estimate the household gas supply.
1.13
What assumptions must be validated to justify the use of a one-sample test on the means?
The variable should be continuous in nature and should follow normal distribution. The data for the variable
should have been collected as a random sample from the population data, and the observations should be
independent of each other.
5
Document Page
1.14
You have been hired to evaluate the claim that the less than 25% of TRU students hold full-time
employment outside of TRU. To do this, you randomly sample 150 TRU students and find that 30 of them
hold full-time employment outside of TRU.
(1) State what type of statistical hypothesis test would you to test this assertion.
(2) Use the 6-step procedure demonstrated in class to test the assertion that less than 25% of TRU
students hold full-time employment outside of TRU at a significance level of 5%. Write out each of
the 6-steps, including all relevant equations, and clearly state the outcome of the test.
Size of sample = n=150 and the sample proportion is p
^¿= 3
15 = 1
5
¿ =0.2
(1) The claim can be tested by a left tail alternate hypothesis at 5% level of significance by z-test for one
sample proportion.
(2) Step 1: The null hypothesis H0: ( p=0 . 25 )
Step 2: Alternate hypothesis HA: ( p<0 .25 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: One sample z-test.
zstat= p
^¿ p
p(1 p )
n
= 0 . 20 . 25
0 . 250 .75
150
=1 . 414 ¿
Step 5: The p-value is calculated as P ( z <1 . 414 ) =0 . 0786
Step 6: The p-value was greater than α=0 . 05 and the null hypothesis failed to get rejected. Hence, the
proportion was not significantly less than 0.25.
6
Document Page
1.15
You have been hired to determine whether the average snow depth at a ski resort exceeds 2 meters. To
do this you measure the snow depth in cm at 30 random locations throughout the resort. These data are
approximately normally distributed, with a mean of 210 cm and a standard deviation of 30 cm.
(1) What type of statistical hypothesis test would you use to test this assertion?
(2) Use the 6-step procedure demonstrated in class to test the assertion that the snow depth is greater
than 2 m at a significance level of 5%. Write out each of the 6-steps, including all relevant
equations, and clearly state the outcome of the test.
Size of sample = 30 and the sample mean is x
¿
=210 and s=30
(1) The claim can be tested by a right tail alternate hypothesis at 5% level of significance by one sample t-
test.
(2) Step 1: The null hypothesis H0: ( μ=200 )
Step 2: Alternate hypothesis HA: ( μ>200 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: One sample t-test.
tstat = x
¿
μ
s
n
=210200
30
30
=1 . 826
Step 5: The p-value is calculated as P ( t >1 .826 ) =0 . 0391
Step 6: The p-value was less than α=0 . 05 and the null hypothesis is rejected. Hence, it is concluded that there
is statistical significant evidence that snow depth at a ski resort exceeds 2 meters.
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1.16
You have been hired to determine whether the average summer time temperature in a Kamloops is more
variable in the 20th century (1900-1999) than it was in the 19th century (1800-1899). You have collected the
100 measurements of summer time temperature from each of these time periods, and determined that
the standard deviation of the average summer time temperature in (1800-1899) was 10 °C, while the
average summer time temperature in (1900-1999) was 12°C. Additionally, you have determined that these
samples are both approximately normally distributed.
(1) State what type of statistical hypothesis test you would use to test the assertion that the average
summer time temperature is more variable in the 20th century than it was in the 19th century at a
significance level of 5%.
(2) Use the 6-step procedure demonstrated in class to perform this test. Write out each of the 6-steps,
including all relevant equations, and clearly state the outcome of the test.
Size of both samples = 100 and the sample standard deviations are s1=10 and s2=12
(1) The claim can be tested by a two tail alternate hypothesis at 5% level of significance by F-test for
variances.
(2) Step 1: The null hypothesis H0: ( σ1
2=σ2
2 )
Step 2: Alternate hypothesis HA: ( σ1
2σ2
2 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: F-test.
Fstat = σ2
2
σ1
2 =
n1
n11 s2
2
n2
n21 s1
2
=
144100
99
100100
99
=1 . 44
at 99 degrees of freedom.
Step 5: The p-value is calculated as P ( F >1 . 44 ) =0 .071
Step 6: The p-value was greater than α=0 . 05 and the null hypothesis failed to get rejected. Hence, it is
8
Document Page
concluded that there is not enough statistical significant evidence for difference in variation in summer
temperature between the two time periods.
9
Document Page
1.17
A restaurant has hired you to determine whether a recent change in their menu has increased their
average daily revenue by at least $500. They have provided you with two samples of their daily revenue,
both of which are approximately normally distributed.One sample consists of 65 daily revenue values,
randomly sampled from the period before the change. This sample has a mean of $5,100 with a standard
deviation of $300. The second sample consists of 36 daily revenue values, sampled at random from the
period following the change. This sample has a mean of $5,750, with a standard deviation of $600.
(1) State what type of statistical hypothesis test you would use to test the assertion that the daily
revenue has increased by at least $500 at a significance level of 5%.
(2) Use the 6-step procedure demonstrated in class to test to perform this test. Write out each of the
6-steps, including all relevant equations, and clearly state the outcome of the test.
Size of samples are n1=65 , n2=36
Sample means are x
¿
1=5100 , x
¿
2=5750 standard deviations are s1=300 and s2=600
(1) The claim can be tested by a right tail alternate hypothesis at 5% level of significance by t-test for
difference between means.
(2) Step 1: The null hypothesis H0: ( μ2μ1=μd=500 )
Step 2: Alternate hypothesis HA: ( μ2μ1>500 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: t-test for difference.
tstat = x2x1μd
SE =57505100500
106 .70 =0 .9372 where pooled standard deviation is
SE= s1
2
n1
+ s2
2
n2
= 3002
65 + 6002
36 =106 .70
Step 5: The p-value is calculated as P ( t >0 . 9372 ) =0 . 1768
Step 6: The p-value was greater than α =0 . 05 and the null hypothesis failed to get rejected. Hence, it is
concluded that there is not enough statistical significant evidence to claim that sales have increased by
at least $ 500.
10

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1.18
The city centreneighbourhood is changing as people increasingly choose to live downtown to be closer to
their places of employment. A study from 1979 reveals that from a random sample of 500 residential
properties, 470 of them were multi-family residences. From the 2010 census data, you determine that
from a random sample of 760 residential properties, 605 of them were multi-family residences. You would
like to evaluate the assertion that the proportion of multi-family residences has decreased by at least 10%
between 1979 and 2010.
(1) State what type of statistical hypothesis you would use to test the assertion that the proportion of
multi-family residences has decreased by at least 10% between 1979 and 2010 at a significance level
of 5%.
(2) Use the 6-step procedure demonstrated in class to test to perform this test. Write out each of the 6-
steps, including all relevant equations, and clearly state the outcome of the test.
Size of sample is n1=500 and the sample proportion is p
^¿1=47
50 =0. 94
¿ and n 2=760 and the sample proportion
is p
^¿2=605
760 =0 .796
¿
(1) The claim can be tested by a right tail alternate hypothesis at 5% level of significance by z-test for
difference of two sample proportions.
(2) Step 1: The null hypothesis H0: ( p1 p2=pd=0 . 1 )
Step 2: Alternate hypothesis HA: ( p1 p2=pd >0 .1 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: z-test. zstat= p
^¿1 p
^¿
2p
d
SE
=0 . 940 . 7960. 1
SE =2. 718 ¿ ¿ Where pooled standard
deviation is
SE= p1 ( 1 p1 )
n1
+ p2 ( 1p2 )
n1
= 0 .940. 06
500 + 0 .7960 . 204
760 =0 . 0162
Step 5: The p-value is calculated as P ( z >2 .718 ) < 0 .001
Step 6: The p-value was way less than α =0 . 05 and the null hypothesis is rejected. Hence, proportion of multi-
family residences has decreased significantly by at least 10%.
11
Document Page
1.19
In order to perform parametric statistical tests, you must first demonstrate that the variable(s) that will be
tested are approximately normally distributed. Of the tests we have learned in class, which type of
statistical test could you use to evaluate whether or not the assumption that the variable(s) are
approximately normally distributed is valid?
Type of statistical tests that could be used to evaluate the normality assumption of the variable are: Shapiro
Wilk test and KolmogorovSmirnov test.
1.20
One of the assumptions required to apply a t-test on the sample mean is that the sample data are
independent. Of the tests we have learned in class, which type of test could you use to evaluate this
assumption?
Correlation test or Scatter plot can be used to test that the sample data are independent.
1.21
Why are non-parametric tests preferable for evaluating hypotheses regarding Likert scale variables?
Likert scale variable responses are ordinal in nature where the numbers associated with the options are not
meaningful. Hence, for sample size less than 30 will ne not normal in nature. But, sometimes for interval scale
Likert data with observations greater than 30, parametric tests can be used.
1.22
You have been hired by a client to evaluate the claim that Kamloops’ air quality does not meet Federal air
quality standards. To do this you measure the amount of air pollution present every day for 2 years.
Unfortunately, these data are highly non-normal and cannot be transformed into a nearly normal variable.
Of the tests we have learned in class, which type of statistical test could you perform using these data to
support an assertion that the air quality does not meet the air quality standards?
For non-normal variable, Non-parametric tests such as Wilcoxon Signed Rank test, Mann–Whitney U test, and
Kruskal-Wallis test are used.
12
Document Page
1.23
You want to determine if men are represented as less approachable than women in advertisements
appearing in a popular magazine. To do this, you acquire a random sample of 12 magazines from the past
2 years. You then identify all the image-based advertisements in the magazine and rank the facial
expressions of men and women depicted in these images according to the following 5-point scale: (1) shy;
(2) welcoming; (3) neutral; (4) stern; (5) aggressive. These images provide a sample of 120 images of men
and 300 images of women, each ranked according to the 5-point scale. Of the tests we have learned in
class, which type of statistical test could you perform using these data to support the assertion that men
are represented as less approachable than women in advertisements in this magazine?
We can use Wilcoxon Signed Rank Sum test or Mann-Whitney test to support the assertion that men are
represented as less approachable than women in advertisements.
1.24
Flood frequency analysis requires that we assume that the occurrence of a flood in a given year is
independent of previous or future flood occurrences and can be modelled as a binomial random variable.
You have been asked to perform a flood frequency analysis on dismal creek. Given a historical record of
measurements from dismal creek, consisting of 105 years of data indicating the presence/absence of a
flood in each year, what type of statistical test could you perform to support the assertion that the
occurrence of floods in dismal creek follow a binomial distribution?
Chi-Square test can be used to support the claim that the occurrence of floods in dismal creek follows a
binomial distribution.
1.25
Compare and contrast the strengths and weaknesses of nearest neighbor, bilinear interpolation, and
inverse distance weighted averaging for spatial interpolation.
The Nearest neighborhood interpolation is the simplest approach to interpolation. Instead of calculating the
average of some weighting criteria, this method simply determines the nearest neighbor.
For bilinear interpolation, the average of each cell is determined based on four nearest original cells. The
average is linear and horizontal. This is good for general leveling, but the average for the typical
sections of the local peaks and valleys is low.
The weighted average of the inverted distance is also known and gives no estimates outside the measurements.
Prediction errors are not categorized. The best IDW result is obtained when you take the local
deviation that you want to simulate.
Each random point of the data set is searched for each point of the interpolation point to find the next point of
the interpolation point. This method consists of a relatively small number of randomly assigned points,
but can be used with a larger number of points.
13

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1.26
Given the following air temperature measurements and their measurement times, use linear interpolation
to estimate the air temperature at 11:30am
Time 11:05am 11:40am
Air Temperature (C) 18 19
For linear interpolation with two points the equation of the line can be find as F=λf ( x2 ) + ( 1λ ) f ( x1 ) where λ
denotes the ratio of the distance of the interpolating point from the provided points in the table. Here, λ=25
35
and F is the value of air temperature at the interpolating point. Hence, F=25
35 19+10
35 18=18 .71 Celsius.
1.27
X coordinate 1.2 2 7 5
Y coordinate 1 3 2 1
Cd Concentration 0.3 0.5 0.2 0.7
Use nearest neighbour interpolation to estimate the cadmium concentration at the location (x=5,y=2).
The Euclidean distances from the point (5, 2) are calculated in the following table.
X coordinate 1.2 2 7 5
Y coordinate 1 3 2 1
Distance 3.93 3.16 2 1
The nearest neighbor is (5, 1) with the minimum distance as D =1 unit. Hence cadmium concentration at the
location (x=5, y=2) is 0.7 using Nearest Interpolation method.
14
Document Page
1.28
X coordinate 1.2 2 7 5
Y coordinate 1 3 2 1
Cd Concentration 0.3 0.5 0.2 0.7
Use inverse distance interpolation to estimate the cadmium concentration at the location (x=5,y=2).
The inverse of the Euclidean distances from the point (5, 2) are calculated in the following table using p = 1
and considered as weights.
X coordinate 1.2 2 7 5
Y coordinate 1 3 2 1
Weight 0.25 0.32 0.5 1
Hence cadmium concentration at the location (x=5, y=2) is calculated as
0. 250 .3+ 0. 320. 5+0 . 50. 2+10. 7
( 0 .25+ 0 .32+0 . 5+1 ) =0. 5
15
Document Page
PART 2
The file mean2s.csv contains two columns of data named T1 and T2 representing temperature
measurements at two different spatial locations. These data will be used for questions 2.01-2.03
Question 2.01
Create histograms illustrating the distributions of the two temperature variables (T1 and T2). Correctly
label the x and y axes and give your plot a title.
Histograms of T1 and T2 are drawn as below.
16

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Questions 2.02
Using the 6-step procedure demonstrated in class, perform a statistical test using these data to test the
assertion that the variance of T1 is different than the variance of T2 at a significance of 5%. Write out each
of the 6-steps, including all relevant equations, and clearly state the outcome of the test.
Step 1: The null hypothesis H0: ( σT 1
2 =σT 2
2 )
Step 2: Alternate hypothesis HA: ( σT 1
2 σT 2
2 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: F-test.
Fstat = σ2
2
σ1
2 =100 . 18
89. 43 =1 . 12
at 499 degrees of freedom.
Step 5: The p-value is calculated as P ( F >1 .12 ) =0. 2052
Step 6: The p-value was greater than α=0 . 05 and the null hypothesis failed to get rejected. Hence, it is
concluded that there is not enough statistical significant evidence for difference in variation in summer
temperature between the two spatial locations.
Questions 2.03
Using the 6-step procedure demonstrated in class, perform a statistical test on the assertion that the mean
of T2 is greater than the mean of T1 at a significance of 5%.
Step 1: The null hypothesis H0: ( μ2=μ1 )
Step 2: Alternate hypothesis HA: ( μ2> μ1 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: t-test for difference.
tstat = x2x1
SE = 20 .. 25419 .03
106 .70 =1 . 986 where pooled standard deviation is
SE= s1
2
n1
+ s2
2
n2
= 100. 18
500 + 89. 432
500 =1 . 9863
Step 5: The p-value is calculated as P ( t >1 .98632 ) =0 . 0236
Step 6: The p-value was less than α =0 . 05 and the null hypothesis was rejected at 10%. Hence, it is concluded
that there is enough statistical significant evidence to claim that average temperature at T2 was
significantly greater than that of the T1 location.
17
Document Page
The file prop2s.csv contains two columns of data named N1 and N2 representing the skill level of skiers
at two different ski resorts. These data will be used for questions 2.04-2.05.
Question 2.04
Create box plots of the variables representing the relative frequency of the skill level variables N1 and N2.
Correctly label the x and y axes and give your plot a title.
Box plots of the variables have been represented below.
18
Document Page
Questions 2.05
Using the 6-step procedure demonstrated in class, perform a statistical test on the assertion that the
proportion of expert level skiers at resort N1 is less than the proportion of expert level skiers at N2 at a
significance of 5%.
We consider level = 4 skiers as expert, and valid size of sample is n1=200 and the sample proportion is
p
^¿1=49
200 =0 .245
¿ and n 2=176 and the sample proportion is p
^¿2=61
176 =0 .35
¿
(1) The claim can be tested by a right tail alternate hypothesis at 5% level of significance by z-test for
difference of two sample proportions.
(2) Step 1: The null hypothesis H0: ( p1 p2=pd=0 )
Step 2: Alternate hypothesis HA: ( p1 p2=pd <0 )
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test: z-test. zstat= p
^¿1 p
^¿
2p
d
SE
=0 . 2450. 350 .
SE =2. 23 ¿¿ Where pooled standard
deviation is
SE= p1 ( 1 p1 )
n1
+ p2 ( 1p2 )
n1
= 0 .2450 .755
200 + 0 .350 . 65
176 =0 . 047
Step 5: The p-value is calculated as P ( z <2 . 23 ) =0 . 0129
Using R command pnorm(-2.23, lower.tail=TRUE)
Step 6: The p-value was way less than α=0 . 05 and the null hypothesis is rejected. Hence, proportion of level 4
expert skiers was significantly greater in place N1.
The file nonpar.csv contains 3 columns of data excerpted from a survey on skier satisfaction at a ski
resort. Columns 1&2 (Sat00 & Sat15) contain the response (1-strongly disagree, 2-disagree, 3-neutral, 4-
agree, 5-strongly agree) to the statement “I am satisfied with the ski conditions today” given by skiers in
2000 and 2015 respectively. These data will be used for question 2.06-2.08
19

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Question 2.06
Calculate the 5-number summary of the variable Sat00
The 5-number summary of the variable Sat00 is in the following table.
Question 2.07
Calculate the interquartile range of the variable Sat00
The Interquartile Range was calculated in R as Q3-Q1= 1 for Sat000.
Question 2.08
Calculate the inner fences of the distribution of the variable Sat00. State how many outliers, if any, there
are in this dataset.
The inner fence is known as= Q11. 5IQR=31 . 51=1. 5
A single outlier value was observed for Satisfaction in 2000.
20
Document Page
Questions 2.09
Using the 6-step procedure demonstrated in class, perform a signs test to test the assertion that skiers are
satisfied with the ski conditions in 2000 at a significance of 5%.
Step 1: The null hypothesis for skiers’ satisfaction is H 0: π =4
Step 2: The alternate hypothesis is H 0: π < 4 .
Step 3: Level of significance α=5 %=0 . 05
Step 4: Choice of test one tail Signs-test.
Signstat=B=38 Is the number of positively satisfied subjects with the binomial null hypothesis is H0:
p=0 . 5 with n = 75
Step 5: The statistic value is B = 38, with the p-value calculated as p = 0.59 using R-command pbinom
(38,75,0.5)
Step 6: The p-value was greater than α=0 . 05 and the null hypothesis failed to get rejected at 5% level.
21
Document Page
Questions 2.10
Using the 6-step procedure demonstrated in class, perform a Mann-Whitney test to test the assertion that
skiers are less satisfied with the ski conditions in 2015 than they were in 2000 at a significance level of 5%.
In 2000, the satisfied skiers were 38 in numbers and this reduced to 21 in 2015. The median of the two
samples were evaluated as ( η2=3η1=4 )
Step 1: Hence, the null hypothesis is H0: ( η2=η1 )
Step 2: And the left tailed Alternate hypothesis is HA: ( η2 <η1 )
Step 3: Level of significance is α=5 %=0 . 05
Step 4: Choice of test: Mann-Whitney test for difference in satisfaction for 2000 and 2015.
We know compute the sum of the ranks of the two samples and rephrase the hypotheses in terms of the test
variable (“W”) as
H0: ( W 2=μw )
HA: ( W 2 < μw )
We know that the statistic W is normally distributed and the mean and standard deviation can be evaluated as
follows,
μw =n2 ( n2 +n1+1 )
2 = 75 ( 75+75+1 )
2 =5662.5
And
σ S
2= n2 n1 ( n2 +n1 +1 )
12 = 7575 ( 75+75+1 )
12 =70,781.25
¿> σS
=266.05
Now we tabulate the data and calculate the sum of the ranks as below.
Respons
e Sat00 Sat15
Average
Rank
1 20 11 10.5
2 21 13 31
3 34 30 66.5
4 0 20 116
5 0 1 145.5
The sum of the ranks for 2015 is =
W =1110.5+1331+3066.5+ 20116+ 1145.5=4979
Step 5: Hence, the zstat=49795662 .5
266 . 05 =2 .57
The p-value is calculated as P ( z <2 . 57 ) =0 . 0051
22

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Using R-command pnorm (-2.57, lower.tail =TRUE)
Step 6: The p-value was less than α =0 . 05 and the null hypothesis was rejected at 10%. Hence, it is concluded
that there is enough statistical significant evidence to claim that skiers were satisfied more in 2000
compared to 2015.
23
1 out of 23
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]