Assignment on Regression Analysis PDF

Added on 2021-05-12

1 Pages2450 Words81 Views

Prev Next Multiple Time Series Modeling Using the SAS VARMAX Procedure Chapter 1: Introduction Chapter 3: Regression Analysis with Autocorrelated Chapter 2: Regression Analysis for Time Series Data Introduction The Data Series Durbin-Watson Test Using PROC REG Definition of the Durbin-Watson Test Statistic Procedure Output Cochrane-Orcutt Estimation Conclusion Introduction Find answers on the fly, or master something new. Subscribe today. See pricing options. This chapter presents a simple, naive example of an ordinary regression using time series data. The results from this analysis can lead to unrealistic assumptions. Even when some of the errors are eliminated by the application of more refined techniques, the conclusion is doubtful. In practice, many regression models for time series data produce similar results. This chapter presents an analysis that is obviously in error in order to set the scene for properly modeling the dynamics of time series in later chapters. The Data Series The example in this chapter uses quarterly data for the milk production in the United States, measured in millions of pounds, as the dependent variable and the number of milk cows as the independent variable. This regression can be understood as a calculation of the milk production per cow in the form of the estimated regression coeffi cient. Quarterly dummies are applied in the regression because the relation might be affected by weather condi tions. The data set includes data from 1998Q1 to 2012Q4, giving a total of T = 60 observations. The series are plotted by the code in Program 2.1 . Program 2.1: Plotting the Two Time Series in an Overlaid Plot PROC SGPLOT DATA=SASMTS.QUARTERLY_MILK; SERIES Y=PRODUCTION X=DATE/MARKERS MARKERATTRS=(SYMBOL=CIRCLE COLOR=BLUE); SERIES Y=COWS X=DATE/MARKERS MARKERATTRS=(SYMBOL=TRIANGLE COLOR=RED) Y2AXIS; RUN ; Figure 2.1 shows that the series for milk production has a clear seasonal pattern, while seasonality seemingly is absent for the series for numbers of cows. Moreover, the milk production is clearly trending upward, while the numbers of cows vary in cyclic way. Figure 2.1: Plots of the Time Series of Milk Production and the Number of Cows in the United States Durbin-Watson Test Using PROC REG For this data set, you apply PROC REG, using production as the dependent variable, called y and using the num ber of cows, denoted x , as the independent variable. In mathematical terms, the model is written as follows: = + + + + + y α β x δ q δ q δ q ε 1 1 2 2 3 3 t t t t t t The parameterization includes the dummy variables, Q , Q , and Q , for the three first quarters, leaving the inter 1 2 3 cept, α, as the value for the fourth quarter. These dummies are defined by letting, for example, Q = 1 for the first 1t quarter and Q = 0 for the remaining quarters. The parameter β could in naive terms be interpreted as the milk 1t production per cow or more precisely taking the units of measurement into account, the milk production mea sured as millions of pounds for one thousand cows. The code, Program 2.2 , estimates this naive model using PROC REG. Program 2.2: Durbin-Watson Test Using PROC REG PROC REG DATA=SASMTS.QUARTERLY_MILK PLOTS=ALL; MODEL PRODUCTION=COWS Q1 Q2 Q3/DWPROB; ID DATE; TEST Q1=Q2=Q3=0; RUN; In regression models that are estimated by ordinary least squares (OLS), a crucial assumption is that the remain der terms, ε , should be uncorrelated. Usually, this assumption is not as obvious for time series data as it is for t other types of data sets. In this example, a high production one quarter could well continue the next quarter be cause the actual cows are the same for some years. Definition of the Durbin-Watson Test Statistic The Durbin-Watson test statistic is defined by the following: T ∑ 2 ( − ) e e − 1 t t = 2 t = D W T ∑ 2 e t = 1 t This test statistic is closely related to the first-order autocorrelation of the residuals. The first order autocorrela tion is defined as the correlation coefficient, corr(ε , ε ), between a term ε and the previous term ε . In time se t t -1 t t -1 ries, a usual assumption is that the variance of the residuals ε is constant and that the relation expressed by the t autocorrelation is constant. In other words, the variance and the autocorrelation are both assumed to be indepen dent of the time index t . Similarly, the lag k autocorrelation is defined by corr(ε , ε ). t t - k For the residuals, e , which sum to zero as always for residuals from a regression model, the first-order autocorre t lation is estimated by the following: T ∑ e e − 1 t t = 2 = t r 1 T ∑ 2 e t = 1 t By these formulas, the following approximate relation exists between the Durbin-Watson test statistic and the es timated first order autocorrelation: DW ≈ 2(1 − r ) 1 By definition, the Durbin-Watson statistic is bound to the interval from 0 to 4. If the test statistic equals 2, the residuals are independent—at least they show no first-order autocorrelation. If the value is close to 4, the residu als have a negative autocorrelation, while values of the Durbin-Watson test statistic close to 0 indicate a positive autocorrelation. The distribution of the Durbin-Watson test is not explicitly known. Usually, an approximation is applied in the form of tables including a “gray zone” of nondecisive values. These tables allow for different numbers of inde pendent variables in the model. This approximation is useful for short time series of say up to 30 observations. For longer time series, a calculation of the p -value by the asymptotic distribution of the first order autocorrelation gives an acceptable approximation. The Durbin-Watson test tests only against the possibility of first-order autocorrelation in the residuals. For quar terly data, a fourth-order autocorrelation could be expected as well. But in the present setup, where quarterly dummies are included in the model, this situation is unlikely. More importantly, second-order autocorrelation can be present even if there is no first-order autocorrelation. So acceptance of a model by the Durbin-Watson test statistic is, strictly speaking, not reason enough to conclude that no autocorrelation exist. On the other hand, a significant Durbin-Watson test statistic can point toward model deficits other than first-order autocorrelation. So the test statistic is often just a simple way to see whether something is wrong with the model. The test statistic is often used together with other similar tests for problems like heteroscedasticity and non- normality as crude indicators for the model fit. Procedure Output The option DWPROB to the MODEL statement gives the Durbin-Watson test statistic and the p -value for the test. This is the classical way to test for autocorrelation in residuals of regression models. Moreover, the first- order autocorrelation is printed. These test results are given in Output 2.1 . In this situation, the autocorrelation problem is huge. And the Durbin-Watson statistic, DW = .044, is close to its lower boundary (which is zero), and the autocorrelation, r = .936, is close to its upper bound, which is 1. For this particular time series data, the test 1 leads to a p -value very close to zero, and the hypothesis of independent residuals is clearly rejected. Output 2.1: The Durbin-Watson Test The conclusion is that OLS estimation is inefficient because the estimation should, preferably, be corrected for the autocorrelation. However, the estimates obtained by least squares in spite of the autocorrelation retain the at tractive quality of being unbiased. So the estimated numbers for the regression coefficients are often not much disturbed by residual autocorrelation. The real problem arises when testing is performed, as the printed standard deviations for the estimated regression coefficients and all p -values are misleading. An intuitive way of explain ing this situation is that the positive autocorrelation means that the observations are drawn from much fewer than 60 independent sources of information because the autocorrelation makes consecutive observations look alike. The printed test results for the regression parameters is, for this reason, in error ( Output 2.2 ). The same has to be said about the test for all seasonal dummies being zero ( Output 2.3 ). This test is printed by the TEST statement in Program 2.2 . Output 2.2: Parameter Estimates from Ordinary Least Squares Estimation Output 2.3: Simultaneous Test for Seasonality Cochrane-Orcutt Estimation Such problems are often seen when you are analyzing time series data using OLS by PROC REG. PROC REG offers no obvious solution to correct these errors. PROC REG focuses on cross-sectional data sets for which vari able selection, identification of outliers, and influential data points are the main issues. But by using simple pre processing in a DATA step, you might be able to analyze the data in a more correct way, even when using PROC REG. The classical way in econometrics is to allow for autocorrelated residuals in applying Cochrane-Orcutt estima tion. The idea is to transform the series by taking into account the estimated first-order autocorrelation for the residuals. This number, φ = .936, is printed in Output 2.1 . 1 The method relies on the assumption that the residuals have the form of a first-order autoregressive, AR(1), model: = + ε φ ε ζ − 1 1 t t t where the remainder terms ζ are assumed to be independent and identically distributed. A series of this form has t a first-order autocorrelation that equals φ . In Chapter 6 , this model is extended in many ways to a very useful 1 class for time series data. The regression model is then transformed in the following way: ̃ = − y y φ y − 1 t 1 t t = + + + + + − ( + + + + + ) α β x δ q δ q δ q ε φ α β x δ q δ q δ q ε 1 1 2 2 3 3 − 1 1 1 − 1 2 2 − 1 3 3 − 1 − 1 t t t t t 1 t t t t t = + ̃ + ̃ + ̃ + ̃ + α β x δ q δ q δ q ζ 1 2 3 1 2 3 t t t t t where ̃ = − y y φ y − 1 1 t t t ̃ = − x x φ x − 1 t t 1 t = − ζ ε φ ε − 1 1 t t t and ̃ = − q q φ q − 1 1 i t i t i t The manipulation of the data is easily coded as a DATA step followed by an application of PROC REG ( Program 2.3 ). Note that the LAG function returns the lagged value of the series. In other words, for example, LAG(COWS) equals the number of cows in the previous quarter. Program 2.3: Cochrane-Orcutt Estimation by a DATA Step and PROC REG DATA CO_TRANSFORM; SET SASMTS.QUARTERLY_MILK; Y=PRODUCTION-0.936*LAG(PRODUCTION); X=COWS-0.936*LAG(COWS); QQ1=Q1-0.936*LAG(Q1); QQ2=Q1-0.936*LAG(Q2); QQ3=Q1-0.936*LAG(Q3); RUN; PROC REG DATA= CO_TRANSFORM PLOTS=ALL; MODEL Y=X QQ1 QQ2 QQ3/DW DWPROB; ID DATE; TEST QQ1=QQ2=QQ3=0; RUN; The estimated coefficient to the number of cows has changed from 17.997 to 7.796 ( Output 2.4 ), and the standard deviation for the parameter estimates is much smaller than in Output 2.3 . The seasonal dummies are now signifi cant, meaning that a seasonality exists in the production of milk per cow, which is intuitive. Output 2.4: Parameter Estimates by Cochrane-Orcutt Estimation The autocorrelation problem is fixed according to the Durbin-Watson test statistic ( Output 2.5 ). The method re duces the number of observations in the analysis by one, as is clearly stated in Output 2.5 , because the definition of the variables in the DATA step excludes the first observation, which cannot be defined because it has no lagged value in the data set. Output 2.5: Durbin-Watson Test for the Residuals of Cochrane-Orcutt Estimation Such large changes in parameter estimates are usually not seen by Cochrane-Orcutt estimation when values of the first-order autocorrelation are around, say, .5. But in this case , φ = .936. This value is very close to the upper 1 limit +1, which corresponds to a unit root. When the value φ = 1 is applied, it makes the whole model more dy 1 namic in handling the quarterly changes in the two time series. This is the subject of Chapter 4 where the exam ple is continued. Conclusion This chapter demonstrates the shortcomings of regression models when estimated by OLS for time series data that has autocorrelated errors. The old-fashioned tool for mending the problems, the Cochrane-Orcutt estimation algorithm, works, but it is not the final solution of the problems. Nowadays, more efficient procedures exist for full maximum likelihood estimation of all parameters in models for time series data. For modeling multiple time series, SAS offers many other procedures that are designed especially for time series, such as PROC AUTOREG, which is a straightforward extension of PROC REG. The AUTOREG procedure will be considered in the next few chapters, but the rest of the book will concentrate on the much more specialized procedure, PROC VARMAX, which includes up-to-date models for the dynamics of multiple time series. Settings Support Sign Out © 2020 O'Reilly Media, Inc . Terms of Service / Privacy Policy

End of preview

Want to access all the pages? Upload your documents or become a member.

Assignment on Regression Analysis PDF

|2503

|168

Assignment on Regression Analysis PDF

End of preview

Assignment on Regression Analysis PDFlg...

Assignment on Regression Analysis PDF