Assignment on Regression Analysis PDF

Added on 2021-05-12

1 Pages2503 Words168 Views

Prev Next Multiple Time Series Modeling Using the SAS VARMAX Procedure Acknowledgment Chapter 2: Regression Analysis for Time Series Data Chapter 1: Introduction Introduction Ordinary Regression Models Regression Models in Time Series Analysis Time Series Models Which Time Series Features to Model Parameterized Models for Time Series Introduction This chapter outlines the intentions of this book. First, it briefly describes the special problems that emerge when Find answers on the fly, or master something new. Subscribe today. See pricing options. time series data has to be analyzed by means of regression models, as compared with applications of regression models in simpler situations. Explained next are the extra features that are necessary for performing regressions analysis on multivariate time series. Ordinary Regression Models The subject of regression models is included in most introductory statistical courses. The basic formulation using just a single right-hand side variable is as follows: y = α + β x + ε i i i Here is a simple example using this regression model with teenagers’ heights as the right side variable and the weight as the left side variable. This data is clearly not a time series data set, but a cross-sectional data set. Program 1.1 shows the SAS code for this analysis for the well-known data set CLASS in the SASHELP library. The model specifies that the weight is considered as a linear function of the height. This relationship is not exact, and the observed differences between the observed weight and the weight as predicted by the regression line form the errors ε . i Program 1.1: A Simple Application of PROC REG for Cross-Sectional Data PROC REG DATA=SASHELP.CLASS; MODEL WEIGHT=HEIGHT; RUN; The resulting regression is then presented as a table that shows the estimated parameter values, their standard de viations, and results of the t -tests. Also, various plots are presented by PROC REG, using the SAS Output Delivery System (ODS) GRAPHICS facilities. An example is the regression plot in Figure 1.1 , which is useful when only a single right side variable is used. Figure 1.1: Regression Plot for Cross-Sectional Data Generated by Program 1.1 This analysis relies on many statistical assumptions in order to ensure that the estimation method is efficient and that the printed standard deviations and p -values are correct. Moreover, the relationship between the two vari ables must be linear. In this situation, the x -variable, which is the height of the teenager, is assumed to be fixed in advance so that the assumption of exogeneity is met. These assumptions are formulated for the residuals ε . In short, the error terms must be independent and identi i cally distributed. Also, they are often assumed to have a Gaussian distribution. In the example in Program 1.1 , these assumptions are met; at least, it is not clear that they are violated. In the context of this book, the most im portant assumption is independence. The further assumption E[ε|X] = 0 (which is, of course, dubious in this ex ample) is irrelevant if the model by assumption describes only the conditional distribution of the y’ s, given the x ’s. A clear dependence will be present if the data set could include twins; otherwise, human beings are individuals. The data set consists of teenagers. One might think that gender, race, or age could influence the relationship be tween height and weight. If this information is available, it can be tested; age and gender are in fact variables in the data set. If such effects are of some importance for the estimated relationship, they could lead to dependence among the residuals—for example, that most of the girls have weights below the regression line, while the ob served weights for most of the boys are above the line. However, this effect is difficult to establish with such a small data set. For non–time series, correlation among the observations is usually not a serious problem. Most often, it is simply a problem of a missing variable, like gender in this example, which could be easily solved by adding a variable to the model. Regression Models in Time Series Analysis The most important problem when regression models are applied in time series analysis is that the residuals are usually far from independent. The analysis in Program 1.2 gives an example. The code is attempting to explain the log-transformed level of a price index (the variable LP) for more than 150 years in Denmark using the log- transformed wage index, LW, as the right side variable. Both series are log-transformed. Program 1.2: A Simple Application of PROC REG for Time Series Data PROC REG DATA=SASMTS.WAGEPRICE; MODEL LP=LW; RUN; The regression plot ( Figure 1.2 ) clearly shows that something is wrong, because the observations seem to vary systematically around the line, not randomly as in the previous example, Figure 1.1 . It seems that the observa tions move along a curve that “flutters” around the regression line. Wages and prices usually increase over time, so the observations are nearly ordered by years from the left to the right. The twisting around the regression line shows that the prices for many consecutive years could be high compared to the wages. But in periods of many consecutive years, prices could be smaller when compared to the wages. In statistical terms, this finding tells you that the residuals are highly autocorrelated. Figure 1.2: Regression Plot for Time Series Data Generated by Program 1.2 Economically, the model makes sense, and the dependencies among error terms that are close in time are easily understood. However, the whole idea of taking the level of the wages as input to a model for the level of the prices is doubtful. Why not vice versa? Economically, one could argue that prices affect wages when workers want to be compensated for an increasing price level. On the other hand, higher wages to some extent will in crease the price level because the production of goods becomes more expensive. This situation calls for a two- dimensional model whereby both the wage and the price are used as left side variables. Moreover, the mutual de pendence is not necessarily immediate but could include lags. Such models form the basis for this book. Time Series Models A formal definition of time series is that it is a sequence of observations x , x , .., x . The observations, the x ’s, 1 2 T can be one-dimensional, leading to a one-dimensional time series. The x ’s can also consist of observations for many variables, leading to a multidimensional series, which is the main feature of this book. Models for one- dimensional series are the subject of another SAS book (Brocklebank and Dickey, 2003). Therefore, univariate time series models are addressed in this book only as a part of the models for multidimensional time series. In formal mathematical terms, the x’ s in a multidimensional time series are a column vector. But precise mathe matical notations are avoided, and the models are presented without mathematical details. For precise formula tions, see a theoretical textbook (for instance, Lütkepohl, 1993) or the SAS Online Help. The time index is always denoted as t . The time index is a notation for, say, consecutive years, quarters, months, or even the time of day. The series are assumed to be equidistant. This means that the time span between two ob servations is the same, and months are considered as having the same length. In the SAS procedures, the time in dex is often assumed to be a valid SAS date variable (or perhaps a datetime variable) with a suitable format. For more information about handling SAS datetime variables, formats, and other specific subjects for time variables, see Morgan (2006) or Milhøj (2013). Typical examples of time series are used in this book, as listed in the “About This Book” section. The number of observations, denoted T , is usually assumed to be rather large because the underlying statistical theory relies on asymptotics. That is, they are valid only for a large number of observations. Moreover, some of the models are rather involved; they contain many parameters, and the estimation algorithms are based on itera tive processes. For these reasons, the number of observations has to be large in order for the estimation to suc ceed and to obtain reliable estimates. Which Time Series Features to Model The dynamics of many time series can change quickly. The stock market is an extreme example, wherein changes happen in milliseconds. But also many series, such a sales series, develop rapidly, so the sampling frequency has to be months instead of years in order to capture the interesting features of the series. In order to increase the number of observations, you can set the sampling frequency as a short span of time, like a month instead of a year. However, the complication with increasing the number of observations by using a shorter frequency is that many time series include some type of seasonality. Seasonality is often a nuisance because the model must account for seasonality in some way by including extra parameters. But seasonality is mostly handled in an intuitive way. So the most interesting part of the model is for mulated as a model for a seasonal adjusted series. One possibility is to consider a seasonal adjusted series as the basis for analysis. This adjustment could be performed by PROC X12 as described by Milhøj (2013). But from a statistical point of view, you will often prefer to model the original series when the seasonal adjustment could in fluence the model structure that has to be estimated by the time series model. Preferably, the parameters for the seasonal part of the model and for the structural part of the model are estimated simultaneously. Time series methods can have the form of numerical algorithms that try to describe the most important aspects of the development in the series. Exponential smoothing in order to forecast a time series, seasonal adjustment methods, and models for unobserved components are examples of these methods that do not rely on specific pa rameterized statistical models. These methods and how to perform such analyses with SAS are the subject of an other book (Milhøj, 2013). Parameterized Models for Time Series This book focuses on parameterized models. Such models provide the basis for precise statistical inference, mak ing them very useful for estimating important parameters and testing interesting hypotheses. But a specified pa rameterized model is based on many assumptions, and the testing of the model fit could be a rather complicated task. Moreover, the models, the estimation, and the testing rely on advanced statistical and probabilistic theory. In this presentation, the practical analysis using PROC VARMAX in SAS is the focus, and the underlying theory is referred to only loosely. The text is intended as a textbook for the application of PROC VARMAX; for a precise treatment of the theory underlying the statistical models, see a theoretical textbook, such as Lütkepohl (1993). The models are formulated as generalizations of the simple regression model, where the dependence among ob servations at different points in time and the dynamics are included. Regression models can be seen as a form of causality but also just as a correlation. In both situations, the model can be used for forecasting if the independent variables are assumed to be fixed. The regression model then states the conditional distribution of the left side variable, assuming fixed values of the right side variable. Many time series include correlations between consecutive observations. This is natural, as, for example, high sales of a product because of good economic terms last for several quarters. This correlation inside a time series is called autocorrelation . In an elementary regression model, this is often presented as a problem. But in time se ries models, this is turned into an opportunity, for example, to forecast the time series, assuming that the correla tion is persistent. The idea is to predict a future observation by using the expected value in the conditional distri bution when conditioned on already observed values. In order for you to model time series, you often assume that the series have a stable structure; that is, the structure is the same for the whole observation period. Mathematically, this is called stationarity . Many versions of the subject of stationarity exist in probability theory. But for this book, it suffices to note that expected values, vari ances, and autocovariances must be constant over time. Simultaneous dependence between two different variables can also be used for forecasting in situations in which one of the variables is reported first and then could be seen as a leading indicator. An example would be when the actual sales quantity for a month is reported early while the revenue for the same month is a bit harder to com pute and so is known only later. When you are using parameterized time series models in some situations, you can establish causality because the dependence can be directed in time—making it natural to believe that the cause comes before the effect. This is the case if lagged values of one variable are used as the right side variable for the actual value of another variable that is used as the left-hand variable. See an example in Chapter 11 . Settings Support Sign Out © 2020 O'Reilly Media, Inc . Terms of Service / Privacy Policy

End of preview

Want to access all the pages? Upload your documents or become a member.

Assignment on Regression Analysis PDF

|2450

|81

Research in Finance: Correlation among R&D spending, CEO ownership and Board independence

|14

|1738

|299

Quantitative Methods in Finance Essay

|10

|2933

|111

Assignment on Regression Analysis PDF

End of preview

Assignment on Regression Analysis PDFlg...

Research in Finance: Correlation among R&D spending, CEO ownership and Board independencelg...

Quantitative Methods in Finance Essaylg...

Assignment on Regression Analysis PDF

Research in Finance: Correlation among R&D spending, CEO ownership and Board independence

Quantitative Methods in Finance Essay