# Benchmark-descriptive statistics PDF

Added on - 03 Oct 2021

• 7

Pages

• 1009

Words

• 18

Views

• 0

Benchmark-Descriptive Statistics Project
Student Name
University Name
DESCRIPTIVE STATISTICS PROJECT2
Part A
Description of Dataset
The sample dataset provided was provided from the department of transportation and
statistics. It represents “reporting carrier on-time performance.” It has got four variables;
day_of_month and day_of_week are discrete variables that indicates the day of the month and
the day of the week when the data was collected respectively. On the other hand, dep_delay and
arr_delay are continuous variables indicating the difference that exists in minutes between
scheduled and actual departure and arrival times respectively. Early arrival or departure is
denoted by a negative sign. From the data, it is evident that in the year 2015, five hundred
observations on arrival and departure delay were taken on the thirteenth day of a given month
and the second day of a given week in the month. The is normally distributed with an average
departure delay time of -5.296 and an average arrival time was -3.138.
The dataset provided is to be used for carrying out statistical analysis such as determining
the measure of central tendency and variation, creating a plot that best describes the data using
an appropriate graph and use an appropriate probability distribution to estimate parameters.
Additionally, the dataset will be used as an aid to examine statistical viability sources, the
significance of randomness in statistical inference and to analyze the difference that exists
between empirical and theoretical probability and they impact they have on probability
calculations.
DESCRIPTIVE STATISTICS PROJECT3
Part B
The table for descriptive statistics as developed from excel is as shown below and is used
to answer question 1 and 2.
1.To find the appropriate measure of center and explain why it is most appropriate.
The commonly used measures of centers are the mean and the median. However, in this
case, the two variables under consideration; dep_delay and arr_delay are somewhat
uniformly distributed and there are no observable outliers hence the most appropriate
measure of center is the mean. The mean for dep_delay is -5.296 while the mean for
arr_delay is -3.138.
2.To find the most appropriate measure of variation for the data.
The commonly used measure of variation is the standard deviation, the variance and the
range. However, due to the same reason as in question 1, the most appropriate measure of
variation in this case for the variables is the standard deviation. The standard deviation
for dep_delay is 9.21 while that for arr_delay is 9.66.
3.To find the graph needed to most appropriately describe the sample data provided.  