Statistical Methods and Data Analysis: A Comprehensive Project

Verified

Added on 2025/04/08

AI Summary

Desklib provides past papers and solved assignments for students. This project covers statistical data analysis using regression and sampling.

Statistics
1

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Introduction................................................................................................................................3
Question 1..................................................................................................................................4
Question 2................................................................................................................................12
Conclusion................................................................................................................................16
References................................................................................................................................17
2

Introduction
This project focuses on several notions of statistics such as histogram charts, scatter plot
diagrams, regression analysis, the coefficient of correlation, a coefficient of determination,
adjusted coefficient of determination. The project is alienated into 2 fragments. First part
focuses on Histogram Charts, Scatter Plot Diagrams, Quartile, Mean, Median, Mode etc. All
these terms have been properly used to examine a different set of questions asked in part 1.
Another portion of this question focuses on Regression Analysis. A proper Analysis of the
independent variable and the dependent variable has been done in this part.
3

Question 1
a)
To analyze a particular set of data different methods can be used. Some of these approaches
are explicated below: -
Direct Interview: -It is one of the most effective methods in which interviewer can observe
the response of selected population through the study of gestures and physical behavior.
Under this interview, both the ell to the interview is in face to face with each other. This helps
both the ell to communicate what they actually want to communicate and solve any problems
on the spot itself without making any kind of delay.
Phone Interview: - Phone Interview is restored to when it is required to cover a larger area of
investigation and the cost to conduct the face to face interview is very high. Phone interview
helps to cover a large population for interview along with cost efficiency and time-saving.
Written Questioner’s: - Under this method, the interviewer prepares a list of questions to be
asked by the different persons. This method helps the interviewer in analyzing the views of
the different person in respect of a particular set of questions. In this method, people can
provide their views without being influenced by the presence of an interviewer.
Written Questioner is favorited over the other approaches as this method helps in covering a
larger population. This method is also the cheapest way of communication. It also covers
each and every candidate. Quicker results can be generated by using this method. Generally,
Questioner is in Objective Method as it can be easily answered. It also ensures that views of
the population that are selected for the interview are free from bias of interviewer.
b)
Two diverse types of Sampling Technique are as follows: -
 Probability Method of Sampling
 Non Probability Method of sampling
It depends on probability and works to ensure partiality-free sampling. In Probability Method,
each element of the data group has an equal chance of getting selected. In other words,
4

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

probability sampling allows the same level of weight to each unit of data and ensures that the
selected sample is the best representative of considered data. Non-probable sampling works
on other concepts and leads to the selection of the most important data. In the case of Non-
Probability Method, each component of data group does not have an equal chance of
selection. In other words, it provides a biased basis of selection because different weights are
allowed to each set of data according to the significance.
Probability sampling can be further classified into 4 types: -
 Simple Random Sampling
In this method, the sample is selected by ignoring any weight of importance of data and
providing the same level of chances of selection to each data unit.
 Stratified Sampling
In this method, large scale population of data is divided into small sets to ensure that the
sample section process is going to be applied correctly.
 Cluster Sampling
It is a balanced method of sampling in which data population is divided into small sets that
are called clusters. After this, random sampling is utilized to select one cluster as a sample to
ensure that same chances of selection are provided to each cluster (Stattrek, 2019).
 Systematic Random Sampling
As the name is tatting, it is a more balanced form of random sampling and utilized to select
the most appropriate sample from large data population. In this method, multiple data
samples selected as per a random initial point and periodic gap.
From the above-mentioned sampling techniques, Cluster Sampling is best suited. Different
reasons for choosing the cluster sampling are as follows: -
 From the economic point of view, it is considered more economical as compared to other
approaches of sampling
 It provides a better understanding and is more applied oriented as compared to other
sampling techniques.
5

 Before selecting a cluster as a sample, data is required to be divided into small clusters
which help to get a depth understanding of data population.
In the given circumstance the entire population is alienated into different clusters based on
specified criteria. After it, the entire data is appropriately organized on the basis of different
groups made.
c)
A dependent variable can be explained as a measurement which is evaluated or measured in
experiment and value of such variable is proved in such test. Dependent Variable can be
described as those variables which are directly or indirectly dependent on independent
variables. Any change in an independent variable also changes the dependent factor.
Similarly, Independent Variable is used to check how it disturbs the dependent factor. For
instance, practice is an independent variable and perfection is depended variable. A person
who practices more will show more perfection so the level of perfection is depending on
practice (Thoughtco, 2019).
From the given case, it can be analyzed that study time is the Independent variable and the
marks gotten by the students are Dependent Variables. The reason behind the same is that, as
the students dedicate more time to studies they score more marks. Although it can also be
concluded that a number of marks gotten are not straight comparative to the study time as
many other aspects also affects the number of marks gained such as IQ level of the students,
Study Environment, Analysing Capacity of different students.
d)
As mentioned in the first fragment it has been observed that Written Questioner is best of
investigating any particular criteria. This method contains many specifications which makes
it suitable and superior to other methods. However, there are a lot of issues involved in
Written Questioner which are discussed as follows: -
 In some cases, it becomes a very slow process in giving a response. Moreover, some of
the defendants do not take it very extremely and may not answer to the questioner.
 Another issue involved in this technique of investigation is the lack of proper
conversation between the investigator and the respondent. Occasionally, the investigator
6

wanted to ask somewhat different and the respondent is answering different or in a wrong
manner.
 In this method, all-time live presence of interviewer is not possible so it is possible that
the answer of respondent does not relay with requirements and purpose of the
questionnaire (Stattrek, 2019).
 Quality of response provided by respondents depends on the seriousness of respondent
and, irresponsible behavior of respondent leads for the reduction in quality of research
conclusions.
 High level for analytical skills and time is required to correctly understand the response
that is provided by the respondents.
e)
Intervals
Data
Frequency
Relative Data
frequency
Cumulative relative
frequency of Data
20-30 1 0.01 0.01
30-40 8 0.08 0.09
40-50 16 0.16 0.25
50-60 20 0.2 0.45
60-70 20 0.2 0.65
70-80 17 0.17 0.82
80-90 12 0.12 0.94
90-100 6 0.06 1
Total 100 1
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
0
5
10
15
20
25
1
8
16
20 20
17
12
6
Frequency Histogram
Different Class Intervals
F
r
e
q
u
e
n
c
i
e
s
20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
0
0.05
0.1
0.15
0.2
0.25
0.01
0.08
0.16
0.2 0.2 0.17
0.12
0.06
Relative Frequency histogram
Different Class intervals
F
r
e
q
u
e
n
c
i
e
s
20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
0
0.2
0.4
0.6
0.8
1
1.2
Cumulative relative frequency histogram
Different Class Intervals
F
r
e
q
u
e
n
c
i
e
s
The X-Axis is generally considered as Independent Variable (Explanatory)
8

Y-Axis is generally considered as Dependent Variable (Response)
Explanatory variable explains the relation between the independent variable and dependent
variable. The Response Variable Outcome is directly dependent on the Explanatory Variable.
It can be noted that change in Explanatory Variable also results in a change in Response
Variable.
f)
Data
Intervals
Data
frequency
20-30 1
30-40 8
40-50 16
50-60 20
60-70 20
70-80 17
80-90 12
90-100 6
Total 100
0 1 2 3 4 5 6 7 8 9
0
5
10
15
20
25
Scatter Plot Diagram
Data Interval
Data Frequency
g)
9

Intervals x frequency
20-30 25 1
30-40 35 8
40-50 45 16
50-60 55 20
60-70 65 20
70-80 75 17
80-90 85 12
90-100 95 6
Total 100
Summary outlining
Regression Statistics
Multiple R 0.233778
R Square 0.054652
Adjusted R
Square -0.13442
Standard Error 5.975545
Observations 7
ANOVA
df SS MS F
Signific
ance F
Regression 1 10.32
10.3
2
0.2
9 0.61
Residual 5
178.5
4
35.7
1
Total 6
188.8
6
Coefficie
nts
Stand
ard
t
Stat
P-
val
Lower
95%
Upp
er
Lower
95.0%
Upper
95.0%
10

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Error ue 95%
Intercept 18.09 7.68 2.36
0.0
7 -1.65
37.8
3 -1.65 37.83
25 -0.06 0.11
-
0.54
0.6
1 -0.35 0.23 -0.35 0.23
20 30 40 50 60 70 80 90 100
0
5
10
15
20
25
Chart Title
Mid Point
Frequencies
y = 0.069x + 8.357
R² = 0.058
h)
Class Intervals
Class
frequency
Cumulative class
Frequency x fx x*x fx*x
20-30 1 1
2
5 25 625 625
30-40 8 9
3
5 280
122
5 9800
40-50 16 25
4
5 720
202
5 32400
50-60 20 45
5
5
110
0
302
5 60500
60-70 20 65
6
5
130
0
422
5 84500
70-80 17 82 7 127 562 95625
11

5 5 5
80-90 12 94
8
5
102
0
722
5 86700
90-100 6 100
9
5 570
902
5 54150
Total 100
629
0
42430
0
Mean of the given data 62.9 62.9
Median of the given data n/2(even) 50
Class Range 60-70
Mid-Point 65
Variance =
(100)(424300)-(6290)
(6290)
(100)(99)
Variance = 289.4848485
Standard Deviation
=
Standard Deviation = 17.01425427
Quartiles =
Lq1+
12
√ (=)289.4848
(∑▒〖𝑓/4〗-Fq-1/fq1)
*Cq1