Running head: DESCRIPTIVE STATISTICS MODEL1

Benchmark-Descriptive Statistics Project

Student Name

University Name

Benchmark-Descriptive Statistics Project

Student Name

University Name

DESCRIPTIVE STATISTICS PROJECT2

Part A

Description of Dataset

The sample dataset provided was provided from the department of transportation and

statistics. It represents “reporting carrier on-time performance.” It has got four variables;

day_of_month and day_of_week are discrete variables that indicates the day of the month and

the day of the week when the data was collected respectively. On the other hand, dep_delay and

arr_delay are continuous variables indicating the difference that exists in minutes between

scheduled and actual departure and arrival times respectively. Early arrival or departure is

denoted by a negative sign. From the data, it is evident that in the year 2015, five hundred

observations on arrival and departure delay were taken on the thirteenth day of a given month

and the second day of a given week in the month. The is normally distributed with an average

departure delay time of -5.296 and an average arrival time was -3.138.

The dataset provided is to be used for carrying out statistical analysis such as determining

the measure of central tendency and variation, creating a plot that best describes the data using

an appropriate graph and use an appropriate probability distribution to estimate parameters.

Additionally, the dataset will be used as an aid to examine statistical viability sources, the

significance of randomness in statistical inference and to analyze the difference that exists

between empirical and theoretical probability and they impact they have on probability

calculations.

DESCRIPTIVE STATISTICS PROJECT3

Part B

Answers to Questions

The table for descriptive statistics as developed from excel is as shown below and is used

to answer question 1 and 2.

1.To find the appropriate measure of center and explain why it is most appropriate.

The commonly used measures of centers are the mean and the median. However, in this

case, the two variables under consideration; dep_delay and arr_delay are somewhat

uniformly distributed and there are no observable outliers hence the most appropriate

measure of center is the mean. The mean for dep_delay is -5.296 while the mean for

arr_delay is -3.138.

2.To find the most appropriate measure of variation for the data.

The commonly used measure of variation is the standard deviation, the variance and the

range. However, due to the same reason as in question 1, the most appropriate measure of

variation in this case for the variables is the standard deviation. The standard deviation

for dep_delay is 9.21 while that for arr_delay is 9.66.

3.To find the graph needed to most appropriately describe the sample data provided.

