logo

Exploration of Ice Cream Data

   

Added on  2023-03-30

15 Pages2518 Words375 Views
Ice Cream Data 1
EXPLORATION OF ICE CREAM DATA
by[Name]
Course
Professor’s Name
Institution
Location of Institution
Date

Ice Cream Data 2
Exploration of Ice Cream Data
Introduction
The dataset was obtained from Kadiyala (1970) article tittle “Testing for
independence of regression disturbance”. The researcher focused on the investigation of
independence of errors in multiple OLS regression model, but the independence assumption
comes after the normality of the residuals are established. The data contained five variables
which were measured on scale, they include the period (weeks), consumption of ice cream
(pints per capita), weekly price (dollar per pint), consumer income (dollars per week) and the
average weekly temperature (0F) (Kadiyala 1970). The sample size was thirty weeks. The
main aim of the essay is to explore the data specifically on the normality of the errors. The
essay begins with a brief introduction of the dataset, followed by description of the data
(Original data and summary statistics). The next section involves discussion of the normality
assumption followed by conclusion.
Description of Ice Cream Data
The original data is presented in table 1.
Table 1: Ice Cream Consumption Raw Data
Week
Consumption (Pints
per capita)
Price ($ per
pint) Income ($)
Temperature
(Average F)
1 0.386 0.27 78 41
2 0.374 0.282 79 56
3 0.393 0.277 81 63
4 0.425 0.28 80 68
5 0.406 0.272 76 69
6 0.344 0.262 78 65
7 0.327 0.275 82 61
8 0.288 0.267 79 47
9 0.269 0.265 76 32
10 0.256 0.277 79 24
11 0.286 0.282 82 28
12 0.298 0.27 85 26
13 0.329 0.272 86 32
14 0.318 0.287 83 40
15 0.381 0.277 84 55
16 0.381 0.287 82 63

Ice Cream Data 3
17 0.47 0.28 80 72
18 0.443 0.277 78 72
19 0.386 0.277 84 67
20 0.342 0.277 86 60
21 0.319 0.292 85 44
22 0.307 0.287 87 40
23 0.284 0.277 94 32
24 0.326 0.285 92 27
25 0.309 0.282 95 28
26 0.359 0.265 96 33
27 0.376 0.265 94 41
28 0.416 0.265 96 52
29 0.437 0.268 91 64
30 0.548 0.26 90 71
Source: Kadiyala (1970)
The table 2 shows the descriptive statistics for the response (consumption) and
explanatory (price, income and temperature) variables.
Table 2: Descriptive Statistics for Ice cream data
Statistic Consumption Price Income Temperature
Min 0.2560 0.2600 76 24
Median 0.3515 0.2770 83.50 49.50
Mean 0.3594 0.2753 84.60 49.10
Max 0.5480 0.2920 96.00 72.00
Std. Dev. 0.0658 0.0083 6.2456 16.4219
The minimum consumption is 0.2560 pints per capita at price of $0.26 per pint with
an income of $76 per week and an average temperature of 240F. From table 2, the median
consumption is 0.3515 pints per capita at price of $0.2770 per pint with an income of $83.50
per week and an average temperature of 49.500F. Next, the average consumption is 0.3594
pints per capita at price of $0.2753 per pint with an income of $84.60 per week and an
average temperature of 49.100F (Bun and Harrison 2018). Also, the maximum consumption is
0.5480 pints per capita at price of $0.2920 per pint with an income of $96.00 per week and an
average temperature of 720F. However, the standard deviation of consumption is 0.0658 pints
per capita at price of $0.0083 per pint with an income of $6.2456 per week and an average
temperature of 16.420F.
Discussion

Ice Cream Data 4
The data was mainly collected to test for the behaviour of the residuals that is test the
first assumption of multiple OLS regression (errors are normally distributed with zero mean
and a constant variance). It is of importance to define the contextual meaning of errors, since
the term refer to two concepts relevant to the ordinary least square regression model (Ernst
and Albers 2017). Then, in the context of regression, errors (residuals) are the difference
between actual values of the dependent variable and the predicted values from the regression
estimation for the entire population. It is common knowledge that the errors of a regression
model cannot be directly obtained because the population parameters of the true regression
model are unknown. However, investigation of the characteristics of the errors is possible
through the calculation of residuals of a regression model based on a sample data.
The residuals are the results of the difference between the actual values of the
independent variable and the values forecasted (within sample) by the estimated regression
model. The assumption of errors of regression following an approximate normal distribution
is an important aspect of multiple OLS regression. When the assumption holds, its is possible
to make inferences concerning the regression parameters in the population of origin even if
the sample size is small. The inferences are usually based on confidence intervals or
significance tests (Zhu et al. 2015). An indication that given a small sample size and the
assumption of normality is violated; the inferences will not be trusted since the violation
degrade the efficiency of the estimator. In technical sense, when errors follow a normal
distribution, then OLS is the most efficient among all possible unbiased estimators. But, with
non-normal residuals OLS becomes the most efficient among linear unbiased estimators.
Further, errors that do not follow normal distribution imply that the estimated student
t and F statistics do not follow t and F distributions. Conversely, the assumption of normally
distributed errors is not the only requirement for the estimators of the regression to be
efficient and unbiased. However, for this essay only normality of the errors is considered.

End of preview

Want to access all the pages? Upload your documents or become a member.