Statistical Analysis Assignment: Descriptive Stats & Correlation

Verified

Added on  2023/04/21

|9
|1067
|338
Homework Assignment
AI Summary
This assignment solution covers a range of statistical concepts and techniques. It includes calculating measures of central tendency (mean, median, mode, and range) for a given dataset, analyzing the number of letters received per day using descriptive statistics, and constructing a cumulative frequency curve to determine quartiles and the number of villages exceeding a certain population threshold. Further, the assignment involves calculating variance, standard deviation, and mean deviation for temperature data. A comparison of two datasets is conducted using mean, median, and range. The impact of a charge on car occupancy is evaluated using mean calculations. Correlation analysis is performed on tyre data, including plotting a scatter diagram, determining the line of best fit, and interpreting the slope and intercept. Spearman's rank correlation coefficient is calculated and interpreted for judge rankings. Finally, a t-test is conducted to determine if there is a significant difference in leaf thickness between north and south-facing walls, and a conclusion is drawn based on the p-value.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Question 1
a.
Given a set of data (4, 6, 2, 4, 8, 2,1, 8, -3, -5, 2, -8,3, 6, 8, 10)
x
n = 4+6+2+4+8+2+1+835+28+3+6+8+10
16 =3

b.
Mode
x
f
-8
1
-5
1
-3
1
1
1
2
3
3
1
4
2
6
2
8
3
10
1
Mode = 2 and 8

c.
Median=
n
2 + n+2
2
2 = 4+3
2 =3.5

Question 2

Number of letters
Number of Days
cumulative number

of day

0
6 6
1
19 25
2
13 38
3
9 47
4
6 53
5
0 53
6
0 53
7
2 55
a.
Modal number of letters received per day=1 letter
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
b. Median number of letters received per day, since n is odd, median = n+1
2 = 55+1
2 =28
hence
median =2 letters

c.
Mean number letters per day= fx
f = 110
55 =2

d.
Range of letters received=7-0=7
e.
Interquartile range
Q1=
n+1
4 = 56
4 =14
nth term which corresponds to 1 letter.
Q3=
3 ( n+1
4 )=314=42 th term
which corresponds to 3 letters.
Interquartile range=Q3-Q1=3-1=2

Question 3

range of

x
f cf
50-100
7 7
100-150
24 31
150-200
29 60
200-250
18 78
250-300
12 90
0
50 100 150 200 250 300 350
0

10

20

30

40

50

60

70

80

90

100

Cummulative frequency curve
Document Page
a. N for Medium =N/2 and N/2 +1
90
2 +( 90
2 +1)
2 =45.5 th term

From the graph this corresponds to about Q0=180

b.
For the lower quartile, N1= N +1
4 =22.75 th term
with a corresponding value of 110
Document Page
c. For the upper quartile N3= N +1
4 3=68.25
with a corresponding with a corresponding value of
215

d.
Interquartile range=Q3-Q1=215-110=105
e.
Q greater than 260
From the graph, Q260 =79

The number of number villagers with more than 260 villagers =90-79=11 villages.

Question 4

a.

From the cumulative frequency column,

x
x-u (x-u)^2
21
0 0
23
2 4
24
3 9
19
-2 4
19
-2 4
20
-1 1
21
0 0
sum
22
b.
Population mean deviation STD=
( xix )2
n = 22
7 =1.77281

Question 5

Comparison of Tom and Sam’s performance

Sam
Tom
30
36
24
20
48
25
13
39
25
20
mean
28 28
Median
27.5 28
Range
35 19
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question6
a.
Mean before the charge was introduced =1.5176
b.
Mean after the charge was introduced =1.8008
c.
Introducing the charges increased the average number of occupants per car.
Question 7

x
f1 f2 xf1 xf2
1
658 450 658 450
2
275 388 550 776
3
86 125 258 375
4
25 47 100 188
5
4 8 20 40
6
1 1 6 6
Sum
1049 1019 1592 1835
mean
1.517636 1.800785
Document Page
a.
b.
The arrow shows the coordinates of point M
c.
The scatter diagram there is a strong negative correlation between the number of miles and
depth of depth of thread in mm

d.
Line of best fit is shown in the graph
e.
From the regression line,
y = -0.1128x + 7.8326

To become illegal, y=1.6mm,

Substituting in the equation,

1.6== -0.1128x + 7.8326

X=55.25 miles

f.
For 20 thousand miles, x=20
Substituting in the equation,

y = -0.1128*20 + 7.8326=5.5766mm

Question 8

a.
Document Page
1 2 3 4 5 6 7 8 9 10 11
0

2

4

6

8

10

12

14

f(x) = 0.758375333530981 x + 3.84974799881411

R² = 0.997903345253215

A scatter graph of( € )against (miles)

X in miles

Y €

b.
There is a strong positive correlation between the two variables
c.
Coordinates of the mean point
X=
x
n = 60
10
=6 and Y= y
n = 84
10 =8.4
(6,8.4) shown with an arrow in the graph
d.
The line of best fit is shown in the graph
e.
The line of best fit is of the form
y = 0.7584x + 3.8497

When y=12,

Substituting in the equation,

12= 0.7584x + 3.8497

Hence x=10.7467 miles

f.
When x=1, substituting in the estimated equation,
y= 0.7584*1 + 3.8497=4.6081 €

g.
The equation of the line is in the form of y=mx+c
Compared to

Y= 0.7584x + 3.8497

M=0.7584

C=3.8497

h.
In this context, m=0.7584 indicates an increase in cost € for a distance increase in 1 mile
C=3.8497 is the constant cost incurred when no distance is covered.

Question 9

a.
The Spearman’s rank correlation coefficient between two oral test of +.76.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
This shows that the scores in the two oral tests are positively and strongly related to one
another.

b.
The Spearman’s rank correlation coefficient between two oral test of -0.06
This shows that there is a weak negative relationship between the scores in these two tests.

c.

Judge 1
Rank1 Judge 2 Rank2 d dsquare
4
3 4 3 0 0
5
2 6 1 1 1
6
1 5 2 -1 1
2
5 3 4 1 1
1
6 1 6 0 0
3
4 2 5 -1 1
Sum
4
=
1 6 × 4
6 ( 361 ) =0.8857

d.
The rankings given by the two judges are strongly and positively correlated hence.
Question 10

Summary of the statistics

South
North
300
180
270
210
220
220
250
160
210
210
250
190
290
160
190
180
220
200
270
210
Mean
247 192
STD

36.2246

1

21.4993

5

n
10 10
Document Page
Hypothesis
H0:
μs= μn
H1:
μs μn
Assuming a null hypothesis,

t = μsμn
Ss2
ns
+ Sn2
nn
= 247192
36.2252
10 + 21.4992
10
=4.1289

P(|t|
≥4.1289) =2*(0.00446) from t -tables=0.000892
Since the significance level was 5%=0.05

The p-value <p-critical hence reject the null hypotheses hence there is a significant difference between

the leaves thickness.
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]