Statistical Analysis: Tourism Operator Surveys 2013-2017 Data Analysis

Verified

Added on  2019/09/22

|5
|816
|329
Homework Assignment
AI Summary
This assignment involves a comprehensive data analysis of tourism operator surveys conducted in 2013 and 2017. The analysis begins by importing and preparing the datasets, including data type inspection and conversion of specific variables into categorical formats. Descriptive statistics, such as summaries, are generated to understand the data distributions. The assignment then proceeds to perform statistical tests, specifically focusing on the 'Optimistic' variable, to determine if there's a significant change in the level of optimism among tourism operators over the years. The analysis includes one-sample and two-proportion z-tests, along with a comparison of p-values to assess statistical significance. Furthermore, the solution employs random sampling and binary variable creation to simulate agreement levels and calculate means, followed by t-tests to compare the means of the two datasets. The results of these tests are interpreted to provide insights into changes in sentiment among tourism operators over time, with conclusions drawn based on p-values and confidence intervals.
Document Page
## load the required libraries
library(dplyr)
library(magrittr)
##import the dataset into R workspace
t13 <- read.csv("TourismOperators2013.csv")
t17 <- read.csv("TourismOperators2017.csv")
#get the data type
str(t13)
str(t17)
#convert 'optimistic', 'ClimateChangeView' variables into categorical variables
t13$Optimistic <- as.factor(t13$Optimistic)
t13$ClimateChangeView <- as.factor(t13$ClimateChangeView)
t17$Optimistic <- as.factor(t17$Optimistic)
t17$ClimateChangeView <- as.factor(t17$ClimateChangeView)
##summary of the dataset
summary(t13)
summary(t17)
## part Two
##
t13 %>%
na.omit() %>%
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
group_by(Optimistic) %>%
summarise(
proportion = n()/nrow(t13)
) -> t13.optimistic
t13.optimistic
t17 %>%
na.omit() %>%
group_by(Optimistic) %>%
summarise(
proportion = n()/nrow(t13)
) -> t17.optimistic
t17.optimistic
# Part (a)
##(i)
#conducting the one sample prop.test
res= prop.test(x=5, n=10, p=0.7, alternative = "less", correct = F)
#printing the results
res
#getting p-value
res$p.value
#### (ii)
Document Page
## check optimistic feature of GBR in 2013
##randomly create 1 million agreements
smpl.13 <- as.numeric(
sample(t13.optimistic$Optimistic,
size = 1000000,
replace = TRUE,
prob = t13.optimistic$proportion))
## convert sam.13 to binary variable
#assign 0 if samp.13 is less than 5 and 1 ,otherwise
smpl.13[smpl.13 <= 5] <- 0
smpl.13[smpl.13 > 5] <- 1
##mean of sam.13
x= mean(smpl.13)
##count the 1's
y= mean(smpl.13)*length(smpl.13)
y
p_value= seq(0.7, 0.7, length.out = 10)
p_value
##prop.test
res2= prop.test(x=y, n=1000000, p=0.7, alternative = "less", correct = F)
res$p.value > res2$p.value
#From randomly producing 1 million level of agreement with the statement "I am optimistic about the
future of the GBR.??? on a scale from 1 (strongly disagree) to 10 (strongly agree), based on the
Document Page
proportions in the year 2013, we found that around $61\%$ of the tourism operators were generally
optimistic about the future of the GBR in 2013 which is less than $70\%$.
##p-value from part (i) is greater than p-value from part(i) at 5 percent significance level
######## Part (b)
##(i)
# two proportion z test with prop.test
res3= prop.test(x=c(5,5), n= c(10,10))
#get p value
res3$p.value
#### (ii)
## check optimistic feature of GBR in 2017
##randomly produce 1 million agreements
smpl.17 <- as.numeric(
sample(t17.optimistic$Optimistic,
size = 1000000,
replace = TRUE,
prob = t17.optimistic$proportion)
)
## convert sam.17 to binary variable
#assign 0 if samp.13 is less than 5 and 1 ,otherwise
smpl.17[smpl.17 <= 5] <- 0
smpl.17[smpl.17 > 5] <- 1
##two proportion z test
mean(smpl.17)
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
z= mean(smpl.17)*length(smpl.17)
res5= prop.test(x=c(y,z), n= c(1000000, 1000000))
##get p-value
res5$p.value
##compare p value
res3$p.value> res5$p.value
##compare the means with t-test
t.test(smpl.13, smpl.17)
# Based on the result, you can say: at 95% confidence level,
#there is significant difference (p-value = $2.2\times{10^{-16}}$) of the two means. Here you should
reject the null hypothesis that the two means are not equal because the p-value is smaller than 0.05.
#The maximum difference of the mean can be as low as 0.01708372 and as high as 0.01979828. The
output also produces estimates of the sample means, the mean and the degree of freedom of the t-
distribution. Note that Welch???s t-test is a t-test with unequal variances.
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]