Tukey Multiple Comparisons Of Means

Verified

Added on  2022/09/15

|7
|1208
|20
AI Summary

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: DATA VISUALISATION AND EXPLORATORY ANALYSIS
DATA VISUALISATION AND EXPLORATORY ANALYSIS
Name of the Student
Name of the University
Author Note

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Part 1: Visualization:
In this task the arrival data to 27 different countries as given in “World Bank international
arrivals islands” file in moodle are at first pre-processed as the data contains missing values
for many variable. These missing values are given as “.” and “” in the data file and are
appropriately identified using R. The summary statistics of the variables in the data as
computed in R are given below.
summary(worlddata)
country year pop areakm2 gdpnom flights...WB hotels
Min. : 1 Min. : 0 9800 : 5 Min. : 26 327000000 : 3 1135 : 2 95 : 3
1st Qu.: 7 1st Qu.: 3 52000 : 3 1st Qu.: 298 523000000 : 3 36648 : 2 100 :
2
Median :14 Median : 6 52100 : 3 Median : 459 101000000 : 2 5006 : 2
101 : 2
Mean :14 Mean : 6 83000 : 3 Mean : 2095 1211000000: 2 10363 : 1 102
: 2
3rd Qu.:21 3rd Qu.: 9 87000 : 3 3rd Qu.: 2040 182000000 : 2 10555 : 1
103 : 2
Max. :27 Max. :12 (Other):309 Max. :28400 (Other) :290 (Other):150
(Other): 50
NA's : 25 NA's : 49 NA's :193 NA's :290
hotrooms receipt ovnarriv dayvisit arram arreur arraus
1386 : 2 17000000 : 5 19000 : 6 20000 : 4 14000 : 6 3000 : 7 529 : 2
1394 : 2 113000000: 3 5000 : 6 19000 : 3 6000 : 6 100 : 5 10969 : 1
6048 : 2 15000000 : 3 1100 : 5 557000 : 2 100 : 5 200 : 4 118 : 1
1036 : 1 4000000 : 3 130000 : 4 597000 : 2 1100 : 3 4000 : 4 127 : 1
10497 : 1 57000000 : 3 15000 : 4 7000 : 2 1300 : 3 21000 : 3 129 : 1
(Other): 77 (Other) :279 (Other):289 (Other): 77 (Other): 96 (Other): 96 (Other):
59
NA's :266 NA's : 55 NA's : 37 NA's :261 NA's :232 NA's :232 NA's :286
From the summary statistics it is evident that there are a total number of 351 instances in the
variables and many number of missing instances are found in variables dayvisit, arram,
arreur, arraus, hotrooms, hotels, flights...WB. Thus it not good to delete the instances
rowwise as too little part of data will be left for analysis and thus accuracy of insights will be
greatly compromised. Thus the missing values are replaced by linear interpolations with the
non-missing instances in two steps. At first most of the missing values are replaced by
interpolating with non-missing instances. Then the interpolated values are used fill up the rest
Document Page
of missing instances by first to last and last to first nearest neighbour interpolation. Hence, all
the 351 instances are kept for data analysis.
Now, two interesting stories about the world data are proposed one of which will show the
distribution of economy or the GDP of countries and another will show the distribution of
number of visits or the tourism popularity of the countries.
From the stories two important questions can be answered which are
1) On an average which country is the strongest in terms of economy or Gross domestic
product?
2) Which country is most popular in terms of tourism that is indicated by average number of
yearly visits to the country?
Thus using the R aggregate function the average yearly GDP of the countries and the average
yearly visits in days to the countries are accumulated in a data frame for the given period of
13 years. Then these two information are displayed by bar charts as shown below.
Document Page
From the chart it can be seen an uneven distribution of average GDP in between the countries
which are presented in the x axis by numbers of 1 to 27. Thus it tells that economic situation
of the 27 countries are very different from each other only few countries can be considered to
have a rich economy on an average. The country indexed with 25 or Singapore is found to
have the largest economy in terms of average GDP which are way higher than other
countries.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The tourism popularity seems to be somehow evenly distributed with exception of countries
like Mauritius and Solomon Islands have very least popularity and Bahrain and Singapore
have very high popularity as indicated from the days of visits. Of course from the chart it is
clear that the Singapore is also the most popular countries in terms of days of visits which is
just a bit higher than Bahrain.
Part 2: Persuasion
Document Page
Now, in this part the visits from three continents, America, Europe and Australia are
compared to the most economically rich country Singapore by means of boxplot and thus
succeeded by one-way Anova analysis.
Box-plot:
From the box plot it is clear that outliers of visits for three groups or continents are not
intersecting each other and thus there is higher chance significant difference between the
visits from the three continents.
Document Page
One way Anova result:
summary(resanova)
Df Sum Sq Mean Sq F value Pr(>F)
country 2 6.332e+12 3.166e+12 54.48 1.29e-11 ***
Residuals 36 2.092e+12 5.812e+10
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The one-way ANOVA also confirms that at least one group average visits is significantly
different as the p value is very less than 0.05 which is the considered significance level for
the test. Now, for finding which groups have significantly different mean visits the post hoc
Tukey test is performed.
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = visits ~ country, data = dfsingapore)
$country
diff lwr upr p adj
Australia-America -379837.2 -610967.9 -148706.5 8.2e-04
Europe-America 599017.1 367886.4 830147.8 7.0e-07
Europe-Australia 978854.3 747723.6 1209985.0 0.0e+00
The post-hoc test also shows that the average visits in the 13 year span to Singapore is
significantly different from three continents as the p values between them are less than 0.05.
Thus it can be concluded that the visitors to Singapore is not evenly distributed from three
continents America, Europe and Australia whereas the average number of visits seemed to be
much high from Europe rather than America and Australia.
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]