Statistical Methods ICA Assignment - CIS4027, Middlesbrough Uni

Verified

Added on  2022/08/28

|16
|2993
|28
Homework Assignment
AI Summary
This document presents a comprehensive solution to a Statistical Methods assignment (CIS4027) from Middlesbrough University, covering various statistical concepts. The assignment includes solutions to questions involving thematic analysis of online comments, probability calculations (conditional probability, Bayes Theorem), identification of independent and dependent events, and explanations of different measurement scales (nominal, ordinal, ratio, interval). The solution also provides calculations for Z-values, analysis of employee status and salary data, and the application of correlational design for a research study. The final questions involve hypothesis testing, including the formulation of null and alternative hypotheses, and the identification of Type I and Type II errors.
Document Page
Statistical Methods
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Statistical Methods
Question 1
The article of interest for the thematic analysis was collected from
https://www.theguardian.com/money/shortcuts/2020/jan/06/finland-is-planning-a-four-day-
week-is-this-the-secret-of-happiness#comments (Andre, 2020). The sample of 20 comments for
this study are as given below:
1. I would go for it. But I think it would be 37/40 hours in 4 days rather than a 20% cut all
around.
2. Hmm... Didn’t a UK political party recently propose this?
3. It is the secret to happiness if your boss does not expect you to fit 5 days into 4 or answer
work emails from home.
4. Ok if it is 30 hours over 4 days. If one does the current 37.5 over 4, then - if you're like
me - the quality of one's work would suffer in that extra couple of hours each day.
5. It is better that a 5-day week. That is just a fact.
6. What a fantastic idea. Watch, as even more work - of higher and higher skill levels - is
lost to the budding third world.
7. In order to stay mentally healthy, the system should reduce to 4 days out of 7 our
exposure to work environments and keep such exposure to the minimum.
8. I am paid an hourly wage, so no thanks. My goal is to increase the number of hours I
work in a week...
9. Work places are the first source of mental illness.
10. This only works if you don't have Jeremy Corbyn as your Prime Minister apparently,
otherwise its antisemitic Marxism.
2
Document Page
Statistical Methods
11. Lived in Helsinki for 30years, being self-employed I could not afford to work a four day
week. Sounds healthy but in practice would not work for me.
12. After all we are not on this planet for a long time but for a good time. It's about
increasing happiness not increasing money which we will probably never enjoy.
13. The evidence is very selective - sure, if you work in an office you may be able to
complete your work in four days, everywhere else, from retail, to manufacturing, to bus
driving you cannot.
14. No surprise really: the Scandi countries have been demonstrating how life could be
across the western world, if only we'd stop electing egregious chumps into No 10 and the
White House.
15. Incredibly important and perhaps ineluctable. Really exciting to see Finland taking the
initiative on this.
16. If you can earn the same or more income for your company in 4 days as what you
currently do in 5, then what's the problem? If you can't then you are looking eventually a
large pay cut
17. Interesting that MEPs who work nothing like these number of hours think everyone else
should be.
18. We could work a 4 day week now, but corporate profit increases are all that matter in this
world.
19. I do a four-day week, and spend Fridays with my son; it's brilliant!
20. It depends on the nature of the job. A heart surgeon is not going to be 40% more
productive by working less days!
3
Document Page
Statistical Methods
The following themes can be identified from the sample comments above:
1. Politics, which appears four times.
2. Health, which appears three times.
3. Money, which appears five times.
Question 2
Politics as a theme in this context reveals how the adoption of a four-day working week in
heavily reliant on the politics of a country. The comments associate the lack of adoption of this
working model to the leadership elected in the United Kingdom and the United States of
America. This gives an indication the political leadership elected in a country affects working
models and environments. The theme of politics is recognizable from the wording used such as
the case for comment number 2 and emotional language in the case of comment number 14.
The theme of health in the context of the article infer that health is a factor of consideration for
working model adoption. The comments suggest that the adoption of a four-day working week is
associated with better health for the employee. This theme is recognizable from emotive
language as used in comment number 7.
The theme of money in this context captures both pay and profit for employees and employers
respectively. From the comments there is a feeling that money, being the main incentive for
working, it plays a big role in the development of working models and subsequently the adoption
of a four-day working week in the United Kingdom. The comments point at the employees’
4
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Statistical Methods
desire to maintain the same pay despite the reduction in working days while expressing the fear
of employers introducing pay cuts to maintain profit margins. The theme of money is
recognizable from the wordings used as in comment number 16 and emotive language used as in
comment number 1.
Question 3
Noting E represents the event, “England will win the World cup in 2022”. An event F1 for which
F1 and E are independent of each other would be an event whose occurrence does not rely on
England winning the world cup in 2022, or England winning the world cup in 2022 does not rely
on the occurrence of event F1. Event F1 could be for instance: “The England Cricket team training
every week”.
Also noting E represents the event, “England will win the World cup in 2022”. An event F2 for
which F2 and E are dependent on each other would be an event whose occurrence relies on
England winning the world cup in 2022, or England winning the world cup in 2022 relies on the
occurrence of event F2. Event F2 could be for instance: “The England Football team qualifying
for the 2022 World cup”.
Question 4
Consider two events; E and F
The conditional probability of occurrence of event E, given the occurrence of event F is then
given by (Barbara & Susan, 2014):
5
Document Page
Statistical Methods
P ( E|F )= P ( EᴖF)
P( F )
The conditional probability of occurrence of event F, given the occurrence of event E is then
given by:
P ( F |E )= P ( E F)
P( E)
Therefore, we note that:
P ( E|F )P ( F ) =P ( EᴖF ) =P ( F |E ) P(E)
Hence, replacing this in the first equation, we get the Bayes Theorem as follows:
P ( E|F )= P ( F|E )P ( E)
P( F )
Bayes Theorem finds application in cases where there exist prior information on events that are
associated with the event of interest. This prior information is incorporated in computation of the
probability of occurrence of the event of interest. A scenario would be the probability of England
winning the 2022 world cup given they win their opening group match at the tournament.
Question 5
Nominal Scale: this is a scale of measurements where the differentiating factor between objects
is their name. This scale is mainly used for variables that are categorical in nature. Examples
include Marital Status, which may be ‘Married’, ‘Single’ or ‘Divorced’, and Gender, which may
be ‘Male’ or ‘Female’.
6
Document Page
Statistical Methods
Ordinal Scale: this is a scale of measurements where the differentiating factor between objects is
their rank. This measurement scale is considered with the aspect of the order of objects of
interest. Examples include a Political contest, where parties are ranked as 1st, 2nd, and so on
depending on the number of seats they win in an election, and 5-point Likert scales used in
surveys say for product performance evaluation.
Ratio Scale: this is a scale of measurements where the differentiating factor between objects is
the magnitude. The difference in magnitude between the objects under this scale is meaningful,
with the magnitude 0 representing absence of attribute of interest. Examples include mass of
individuals in Kgs and salaries of employees in British Pounds.
Interval Scale: this is a scale of measurement similar to the ratio scale however, for this scale a
magnitude of zero does not necessarily indicate an absence of attribute. Examples include the
Celsius temperature scale and the Fahrenheit temperature scale.
Question 6
The probability of having gold given the host unwraps silver is given by:
P ( G|HS ) = P ( H S|G )P (G)
P(H S )
The probability of the host unwrapping a silver is 1, since we already have this information. The
probability of the host unwrapping a silver while gold is not on the table is 1/3, with the
probability of picking gold being at 1/4. Therefore:
P ( G|HS ) = 1/31/4
1 = 1
12=0.0833
7
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Statistical Methods
The probability of having bronze given the host unwraps silver is given by:
P ( B|HS ) = P ( H S|B )P(B)
P(H S)
The probability of the host unwrapping a silver is 1, since we already have this information. The
probability of the host unwrapping a silver while gold is still on the table is 1/2, with the
probability of picking bronze being at 1/4. Therefore:
P ( B|HS ) =1 /21/4
1 =1
8 =0.125
The probability of having wood given the host unwraps silver is given by:
P ( W |HS )= P ( HS|W )P(W )
P( HS )
The probability of the host unwrapping a silver is 1, since we already have this information. The
probability of the host unwrapping a silver while gold is still on the table is 1/2, with the
probability of picking wood being at 1/4. Therefore:
P ( W |HS )= 1/21/ 4
1 = 1
8 =0.125
Since the probabilities of picking bronze and wood given the host unwraps silver are higher
(0.125) than the probability of picking gold given the host unwraps silver, then swapping the
medal in hand with one of the remaining on the table would increase the chances of finding the
gold medal.
Question 7
8
Document Page
Statistical Methods
The analysis of the frequency of the categories in the Employee Status variable produced the
results in Table 1: Frequency (Employee Status Variable) below. The analysis results show that
in the year 2010 a majority of the employees at the White House fell under the “Employee”
status of employment with only a single employee falling under “Employee (part-time)”.
Table 1: Frequency (Employee Status Variable)
| | x|
|:--------------------|---:|
|Detailee | 31|
|Employee | 437|
|Employee (part-time) | 1|
The analysis of the measures of central tendency and variability for the salary variable produced
the results below. The analysis results on the measures of central tendency show that the average
annual salary is $82721.34 with the median salary value being $66300. The analysis results on
the measures of variability show that the upper quartile is $113000, lower quartile is $45900, the
range can be inferred as $179700, since the min = 0, with the standard deviation being equal to
$41589.43.
Table 2: Measures of Central Tendency and Variability (Salary Variable)
| | min| Q1| median| Q3| max| mean| sd| n| missing|
|:--|---:|-----:|------:|------:|------:|--------:|--------:|---:|-------:|
| | 0| 45900| 66300| 113000| 179700| 82721.34| 41589.43| 469| 0|
Question 8
The formula for the Z-value is given as:
9
Document Page
Statistical Methods
z= xμ
σ
Therefore, for this case, with mean = 3 and standard deviation = 2, the z value for x = 0 is given
as:
z= 03
2 =1.5
Hence, P (x < 0) = P(z < -1.5) = 0.06681 (from normal distribution table).
Question 9
A correlational design can be applied for the study in determining whether the herb can treat
insomnia. In this research design, the herb is going to be made available to a randomly selected
group of 50 volunteers, while the remaining 50 are not going to be given the herb. A study period
of up to 30 days is going to be considered during which the volunteers are going to be asked to
record the amount of sleep, in hours, they have per day (which would be a measure of insomnia).
The data on the average amount of sleep for each of the volunteers will then be checked against
whether the herb was made available to them or not. This research design will assume that all the
100 volunteers experienced insomnia.
Null Hypothesis (H0): Using the herb does not lead to increase in the amount of sleep.
Alternative Hypothesis (H1): Using the herb leads to increase in the amount of sleep.
Type I Error: this would be error of accepting that use of the herb leads to an increase in the
amount of sleep, while in reality the use of the herb does not lead to increase in the amount of
sleep.
10
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Statistical Methods
Type II Error: this would be the accepting that the use of the herb does not lead to increase in the
amount of sleep, when in reality that is false.
Question 10
Hypothesis
H0: The reduction in speed is not statistically significant.
H1: The reduction in speed is statistically significant.
Mathematically this would be, taking the speed using current technology as μ0 and the speed
using new technology as μ1:
H0: μ0 - μ1 = 0
H0: μ0 - μ1 ≠ 0
The test statistic is given by:
z= μ0μ1
σ
n
Hence, the test statistic is given as:
t= 2.01.8
0.5
100
=4
11
Document Page
Statistical Methods
The corresponding p-value for t = 4 is 0.99997, this is equivalent for t = 3.99. Considering an α
level of significance = 0.05, 0.99997 > 0.05, thus we reject the H0: μ0 - μ1 = 0 and conclude that
the reduction in speed is statistically significant.
The additional assumption that need for the Z-test is that the populations’ samples must be
independent of each other.
Question 11
Logistic regression is an important tool for evaluating the effect that one or more variable (s)
(which can be numerical or categorical) and a categorical variable. The logistic regression allows
for the observing of a relationship in instances where the response variable is not necessarily
numerical. The generated logistic regression is as given in Table 3: Logistic Regression Results
below. The intercept is given as -0.8695, this implies that if the budget is 0, the odds of a movie
winning an Oscar are -0.8695. The coefficient for the Budget variable is 0.001837, this implies
that a unit increase in the budget of a movie results in a 0.001837 increase in the odds of the
movie winning an Oscar. Considering the significance level for the study is α = 0.05, for
significance the z value needs to be less than -1.96 or greater than 1.96. The z-value for the
intercept = -3.695 is less than -1.96, hence it is significant. This is also seen from the p-value =
0.00022, which is less than 0.05. The z-value of the Budget = 0.506 is neither less than -1.96 nor
greater than 1.96 hence it is not significant. This is also seen from the p-value = 0.61318, which
is greater than 0.05. We fail to reject the null hypothesis that Budget is not significant in
determining whether a movie wins an Oscar.
12
Document Page
Statistical Methods
Table 3: Logistic Regression Results
Question 12
In order to apply linear regression, the response variable will need to be changed in the problem.
The problem will need to focus on a variable that is numerical such as number of Oscars won by
a movie production company versus the total budget for the movies that won the Oscars for the
movie production company.
Question 13
If the fit of the linear model is poor, we have the option of increasing the number of variables in
the model and re-evaluating the fit. Replacing the explanatory variable is also a possible solution
for poor model fit. However, in instances where a linear model performs poorly, then
considering a nonlinear model would provide a viable alternative.
13
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Statistical Methods
References
Andre, S 2020, News, viewed 6 January 2020,
<https://www.theguardian.com/money/shortcuts/2020/jan/06/finland-is-planning-a-four-day-
week-is-this-the-secret-of-happiness#comments>
Barbara, I & Susan, D 2014, Introductory Statistics, 1st edn, OpenStax CNX, New York.
14
Document Page
Statistical Methods
Appendix: R Code
#Packages
library(readr)
library(readxl)
library(mosaic)
#Question 7
#Importing Data
X2010_White_House_Staff <-
read_excel("D:/FileStorage/Docs/2010_White_House_Staff.xlsx")
X2010_White_House_Staff$`Employee Status`<-
factor(X2010_White_House_Staff$`Employee Status`)
#Descriptive Statistics:
#Employee Status
knitr::kable(summary(X2010_White_House_Staff$`Employee Status`))
#Salary
knitr::kable(favstats(X2010_White_House_Staff$Salary))
#Question 11
#Importing Data
15
Document Page
Statistical Methods
boxOffice <- read_csv("D:/FileStorage/Docs/boxOffice.csv",
col_types = cols(Oscar = col_logical()))
#Modeling
LogModel <- glm(Oscar~ Budget, family = "binomial", data = boxOffice)
summary(LogModel)
16
chevron_up_icon
1 out of 16
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]