York University PSYC 6131A&B - ANOVA Assignment 1 - Fall 2019

Verified

Added on  2022/10/08

|8
|2049
|14
Homework Assignment
AI Summary
This assignment solution addresses several data analysis and visualization tasks using R programming. The solution begins with creating a barplot of journal volumes over time, followed by generating a scatterplot with a linear trendline and a lowess line to analyze the relationship between two variables. The solution then analyzes Google Ngram Viewer data to compare the trends of psychological science and related fields. Furthermore, the solution includes a detailed data description and boxplot analysis of a dataset, comparing sports interest and weight between males and females. The solution also presents a visualization of public opinion on payment methods for public roads. Finally, the solution performs a binomial probability test to determine if a chase player should adjust their rating upward based on the p-value. All code and outputs are included, providing a comprehensive approach to the assignment's requirements.
Document Page
1
Univariate Analysis
Name:
Institution
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
2
1. Bar plots
Year<-c(1890, 1895, 1900, 1905, 1910, 1915, 1920, 1925, 1930, 1935, 1940)
Journal.volume<-c(1,5,5,6,9,12,13,15,21,21,27)
boring<-data.frame(Year,Journal.volume)
barplot(Journal.volume, Year, names.arg = Year,col="steelblue",ylab="Journal
Volume",xlab = "Years")
2. Scatter Plot
X<-c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 7)
Y<-c(1, 2, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7)
plot(X,Y, ylab = "Variable Y",xlab = "Variable X")
abline(lm(Y~X))
lines(lowess(X,Y),col="red")
Document Page
3
As evident, the lowess curve (red) tends to fit the in the plot better than the trend line
since it creates a smoothline through the plot, which aids in exhibiting the relationship
between the variables
3. Google Ngram Viewer
a)
b) The graph above shows that behavior, cognitive, and neuroscience all start at
point 0 whereas psychological science has higher starting point. Moreover, it is
evident that psychological science does not have change rather it is stagnant
across the years. On the other side, both neuroscience and cognitive science
exhibit growth from 1975.
c) To avoid drawing over broadly conclusions to a chart it is recommendable to label
lines individually and highlight the most important characteristics of the chart. For
Document Page
4
instance, in the chart above all the trend lines have different labelling moreover,
the increase and decrease in the lines are definite.
d) The graph shows that hypnotism is the highest value across the years whereas
phenology is the least. Moreover, all the variables have a greater than 0 starting
point. Besides, there is a decrease in all variables from 1975.
4. NeopilQ
a) Data description
NeopliQ<-read.csv(file.choose(), header=T)
describe(NeopliQ)
11 Variables 100 Observations
-------------------------------------------------------------------------------------------------------------------------
sex
n missing distinct
100 0 2
Value f m
Frequency 50 50
Proportion 0.5 0.5
-------------------------------------------------------------------------------------------------------------------------
neur
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 54 0.999 64.17 21.35 35.80 40.70 52.00 64.50 74.00 88.10 96.05
lowest : 11 24 32 36 37, highest: 96 97 98 100 101
-------------------------------------------------------------------------------------------------------------------------
extr
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 49 0.999 124.8 21.41 96.0 100.9 113.0 123.5 136.5 151.3 158.1
lowest : 84 85 89 96 97, highest: 158 160 162 163 169
-------------------------------------------------------------------------------------------------------------------------
open
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 50 0.999 126.6 17.74 100.9 106.8 117.0 125.0 136.5 146.0 149.1
lowest : 85 90 97 98 99, highest: 152 155 157 166 172
-------------------------------------------------------------------------------------------------------------------------
agre
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 53 0.999 125.6 20.26 97.95 102.90 113.00 127.00 139.00 147.10 153.10
lowest : 73 85 93 95 97, highest: 153 155 157 158 160
-------------------------------------------------------------------------------------------------------------------------
cons
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 50 0.999 135.3 22.15 105.0 109.9 120.0 135.5 149.0 159.0 168.3
lowest : 89 93 100 104 105, highest: 168 175 179 180 181
-------------------------------------------------------------------------------------------------------------------------
iq1
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 48 0.999 99.07 17.43 68.9 78.8 90.0 100.0 108.2 119.0 122.0
lowest : 54 55 63 64 67, highest: 121 122 125 130 137
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
5
-------------------------------------------------------------------------------------------------------------------------
iq2
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 50 0.999 100.2 18.96 70.00 78.00 90.75 101.50 111.00 120.00 127.05
lowest : 42 55 61 69 70, highest: 125 127 128 130 137
-------------------------------------------------------------------------------------------------------------------------
height
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 40 0.999 173.9 14.23 155 156 166 173 182 193 196
lowest : 143 153 154 155 156, highest: 194 196 201 203 206
-------------------------------------------------------------------------------------------------------------------------
weight
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 45 0.999 73.54 15.05 54.0 55.9 63.0 73.0 83.0 89.0 95.1
lowest : 53 54 55 56 57, highest: 97 98 103 109 113
-------------------------------------------------------------------------------------------------------------------------
sports
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
100 0 10 0.978 6.06 2.565 2.95 3.00 4.75 6.00 8.00 9.00 10.00
Value 1 2 3 4 5 6 7 8 9 10
Frequency 4 1 11 9 10 24 13 12 10 6
Proportion 0.04 0.01 0.11 0.09 0.10 0.24 0.13 0.12 0.10 0.06
-------------------------------------------------------------------------------------------------------------------------
b) As evident, all variables have 100 observation with 0 missing values.
c) Subset
msports<-subset(sports,sex =="m")
fsports<-subset(sports,sex=="f")
d) Box plots
sports1<-data.frame(msports,fsports)
boxplot(sports1,col="blueviolet",main = "Side by Side Boxplots")
Document Page
6
e) As evident, the males have more interest in sports compared to the female.
Moreover, it is exhibited that male have a median of approximately 7 whereas
female have a median of approximately 6. Moreover, both male and female
recorded a maximum of 10. However, male had a minimum of 3 whereas female
had a minimum of 1.
f) Weight box plot
boxplot(weight~sex, data = NeopliQ,col= "mediumorchid", col = “”Side by
Side Boxplot for Weight)
Document Page
7
g) As evident, the males have more weight compared to the females. Moreover, it is
exhibited that male have a median weight of approximately 80 whereas female
have a median of approximately 65. Notably, the male recorded extremely high
weight (outlier) of above 100. Consequently, it is evident that both the male and
female data is skewed towards the right.
5. As evident, the graph exhibits the public opinion poll on what forms of payment
Americans prefer for public roads. The graph is a good visualization tool since it has
numerous characteristics of visualization tool, which include, chart title (paying for
new road construction), data labels, and figures (new toll 41%, increased gas tax
18%, and no new roads 41%)
6. Binomial probability
N = 15
X = 10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
8
P = 0.3333
Null hypothesis: p < 0.3333
Alternative hypothesis: p > 0.3333
binom.test(10, 15, 0.3333, alternative = "greater")
Exact binomial test
data: 10 and 15
number of successes = 10, number of trials = 15, p-value = 0.008498
alternative hypothesis: true probability of success is greater than 0.3333
95 percent confidence interval:
0.4225563 1.0000000
sample estimates:
probability of success
0.6666667
As evident, the p-value is less than 0.05 thus we reject the null hypothesis and conclude
that the chase player should adjust her rating upward.
chevron_up_icon
1 out of 8
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]