Assignment on Regression Analysis PDF

Verified

Added on 2021/05/12

AI Summary

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Prev Next
Multiple Time Series Modeling Using the SAS VARMAX ProcedureAcknowledgment Chapter2: Regression Analysis for Time SData
Chapter1: Introduction
Introduction
Ordinary Regression Models
Regression Models in Time Series Analysis
Time Series Models
Which Time Series Features to Model
Parameterized Models for Time Series
Introduction
This chapter outlines the intentions of this book. First,it briefly describes the special problems that emerge when
Find answers on thefly, or master something new. Subscribe today. See pricing options.
time series data has to be analyzed by means of regression models, as compared with applications
models in simpler situations. Explained next are the extra features that are necessary for performin
analysis on multivariate time series.
Ordinary Regression Models
The subject of regression modelsis included in most introductory statistical courses. The basic formulation usin
justa single right-hand side variableis as follows:
y = α + βx +εi i i
Hereis a simple example using this regression model with teenagers’ heights as the right side variable
weight as the left side variable. This datais clearly nota time series data set, buta cross-sectional data set.
Program 1.1 shows the SAS code for this analysis for the well-known data set CLASS in the SASHE
The model specifies that the weightis considered asa linear function of the height. This relationshipis not exact,
and the observed differences between the observed weight and the weight as predicted by the reg
the errorsε .
i
Program 1.1: A Simple Application of PROC REG for Cross-Sectional Data
PROC REGDATA=SASHELP.CLASS;
MODEL WEIGHT=HEIGHT;
RUN;
The resulting regressionis then presented asa table that shows the estimated parameter values, their standard de
viations, and results of thet -tests. Also, various plots are presented by PROC REG, using the SAS Output
Delivery System (ODS) GRAPHICS facilities. An exampleis the regression plot in Figure 1.1, whichis useful
when onlya single right side variableis used.
Figure 1.1: Regression Plot for Cross-Sectional Data Generated by Program 1.1
This analysis relies on many statistical assumptions in order to ensure that the estimation methodis efficient and
that the printed standard deviations and p -values are correct. Moreover, the relationship between t
ables must be linear. In this situation, thex-variable, whichis the height of the teenager,is assumed to be fixed in
advance so that the assumption of exogeneityis met.
These assumptions are formulated for the residualsε . In short, the error terms must be independent and identii
cally distributed. Also, they are often assumed to havea Gaussian distribution. In the example in Program 1.1,
these assumptions are met;at least,it is not clear that they are violated. In the context of this book, the most i
portant assumptionis independence. The further assumption E[ε|X] = 0 (whichis, of course, dubious in this ex
ample)is irrelevantif the model by assumption describes only the conditional distribution of the y’s, given the
x’s.
A clear dependence will be presentif the data set could include twins; otherwise, human beings are individuals
The data set consists of teenagers. One might think that gender, race, or age could influence the
tween height and weight.If this informationis available,it can be tested; age and gender are in fact variables in
the data set.If such effects are of some importance for the estimated relationship, they could lead to d
among the residuals—for example, that most of the girls have weights below the regression line, w
served weights for most of the boys are above the line. However, this effectis difficult to establish with sucha
small data set.
For non–time series, correlation among the observationsis usually nota serious problem. Most often,it is simply
a problem ofa missing variable, like gender in this example, which could be easily solved by addinga variable to
the model.
Regression Models in Time Series Analysis
The most important problem when regression models are applied in time series analysisis that the residuals are
usually far from independent. The analysis in Program 1.2 gives an example. The codeis attempting to explain
the log-transformed level ofa price index (the variable LP) for more than 150 years in Denmark using the l
transformed wage index, LW, as the right side variable. Both series are log-transformed.
Program 1.2: A Simple Application of PROC REG for Time Series Data
PROC REGDATA=SASMTS.WAGEPRICE;
MODEL LP=LW;
RUN;
The regression plot( Figure 1.2) clearly shows that somethingis wrong, because the observations seem to vary
systematically around the line, not randomly as in the previous example, Figure 1.1. It seems that the observa
tions move alonga curve that “flutters” around the regression line. Wages and prices usually increase ove
so the observations are nearly ordered by years from the left to the right. The twisting around the
shows that the prices for many consecutive years could be high compared to the wages. But in pe
consecutive years, prices could be smaller when compared to the wages.
In statistical terms, this finding tells you that the residuals are highly autocorrelated.
Figure 1.2: Regression Plot for Time Series Data Generated by Program 1.2
Economically, the model makes sense, and the dependencies among error terms that are close in
understood. However, the whole idea of taking the level of the wages as input toa model for the level of the
pricesis doubtful. Why not vice versa? Economically, one could argue that prices affect wages when w
want to be compensated for an increasing price level. On the other hand, higher wages to some e
crease the price level because the production of goods becomes more expensive. This situation cala two-
dimensional model whereby both the wage and the price are used as left side variables. Moreover,
pendenceis not necessarily immediate but could include lags. Such models form the basis for this boo
Time Series Models
A formal definition of time seriesis thatit is a sequence of observationsx , x , .., x . The observations, thex ’s,1 2 T
can be one-dimensional, leading toa one-dimensional time series. Thex ’s can also consist of observations for
many variables, leading toa multidimensional series, whichis the main feature of this book. Models for one-
dimensional series are the subject of another SAS book (Brocklebank and Dickey, 2003). Therefore, u
time series models are addressed in this book only asa part of the models for multidimensional time series.
In formal mathematical terms, the x’s ina multidimensional time series area column vector. But precise mathe
matical notations are avoided, and the models are presented without mathematical details. For prec
tions, seea theoretical textbook (for instance, Lütkepohl, 1993) or the SAS Online Help.
The time indexis always denoted ast . The time indexis a notation for, say, consecutive years, quarters, months,
or even the time of day. The series are assumed to be equidistant. This means that the time spa
servationsis the same, and months are considered as having the same length. In the SAS procedures,
dexis often assumed to bea valid SAS date variable (or perhapsa datetime variable) witha suitable format. For
more information about handling SAS datetime variables, formats, and other specific subjects for tim
see Morgan (2006) or Milhøj (2013). Typical examples of time series are used in this book, as liste
“About This Book” section.
The number of observations, denoted T, is usually assumed to be rather large because the underlying statistical
theory relies on asymptotics. Thatis, they are valid only fora large number of observations. Moreover, some of
the models are rather involved; they contain many parameters, and the estimation algorithms are
tive processes. For these reasons, the number of observations has to be large in order for the est
ceed and to obtain reliable estimates.
Which Time Series Features to Model
The dynamics of many time series can change quickly. The stock marketis an extreme example, wherein changes
happen in milliseconds. But also many series, sucha sales series, develop rapidly, so the sampling frequency has
to be months instead of years in order to capture the interesting features of the series. In order t
number of observations, you can set the sampling frequency asa short span of time, likea month instead ofa
year. However, the complication with increasing the number of observations by usinga shorter frequencyis that
many time series include some type of seasonality.
Seasonalityis oftena nuisance because the model must account for seasonality in some way by including
parameters. But seasonalityis mostly handled in an intuitive way. So the most interesting part of the modelis for
mulated asa model fora seasonal adjusted series. One possibilityis to considera seasonal adjusted series as the
basis for analysis. This adjustment could be performed by PROC X12 as described by Milhøj (2013)a
statistical point of view, you will often prefer to model the original series when the seasonal adjus
fluence the model structure that has to be estimated by the time series model. Preferably, the par
seasonal part of the model and for the structural part of the model are estimated simultaneously.
Time series methods can have the form of numerical algorithms that try to describe the most impo
the development in the series. Exponential smoothing in order to forecasta time series, seasonal adjustment
methods, and models for unobserved components are examples of these methods that do not rely
rameterized statistical models. These methods and how to perform such analyses with SAS are the
other book (Milhøj, 2013).
Parameterized Models for Time Series
This book focuses on parameterized models. Such models provide the basis for precise statistical in
ing them very useful for estimating important parameters and testing interesting hypotheses. Buta specified pa
rameterized modelis based on many assumptions, and the testing of the modelfit could bea rather complicated
task. Moreover, the models, the estimation, and the testing rely on advanced statistical and probab
this presentation, the practical analysis using PROC VARMAX in SASis the focus, and the underlying theoryis
referred to only loosely. The textis intended asa textbook for the application of PROC VARMAX; fora precise
treatment of the theory underlying the statistical models, seea theoretical textbook, such as Lütkepohl (1993).
The models are formulated as generalizations of the simple regression model, where the dependenc
servationsat different points in time and the dynamics are included. Regression models can be seen asa form of
causality but also just asa correlation. In both situations, the model can be used for forecastingif the independent
variables are assumed to be fixed. The regression model then states the conditional distribution of
variable, assuming fixed values of the right side variable.
Many time series include correlations between consecutive observations. Thisis natural, as, for example, high
sales ofa product because of good economic terms last for several quarters. This correlation insidea time series
is called autocorrelation. In an elementary regression model, thisis often presented asa problem. But in time se
ries models, thisis turned into an opportunity, for example, to forecast the time series, assuming that the
tionis persistent. The ideais to predicta future observation by using the expected value in the conditional distri
bution when conditioned on already observed values.
In order for you to model time series, you often assume that the series havea stable structure; thatis, the structure
is the same for the whole observation period. Mathematically, thisis called stationarity. Many versions of the
subject of stationarity exist in probability theory. But for this book,it suffices to note that expected values, vari
ances, and autocovariances must be constant over time.
Simultaneous dependence between two different variables can also be used for forecasting in situatio
one of the variablesis reported first and then could be seen asa leading indicator. An example would be when the
actual sales quantity fora monthis reported early while the revenue for the same monthis a bit harder to com
pute and sois known only later.
When you are using parameterized time series models in some situations, you can establish causality b
dependence can be directed in time—makingit natural to believe that the cause comes before the effect. Thisis
the caseif lagged values of one variable are used as the right side variable for the actual value of an
thatis used as the left-hand variable. See an example in Chapter 11.
SettingsSupportSign Out
© 2020 O'Reilly Media, Inc. Termsof Service/ Privacy Policy

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

1 out of 1

Assignment on Regression Analysis PDF

Contribute Materials

Secure Best Marks with AI Grader

Related Documents

Assignment on Regression Analysis PDF

Multivariate Collinearity and Regression Analysis

+13062052269

info@desklib.com