Advanced Data Analysis: Time Series, Cointegration, and Logit Modeling
VerifiedAdded on 2019/09/16
|10
|982
|45
Homework Assignment
AI Summary
This assignment solution addresses three problems in data analysis. Problem 1 focuses on time series analysis of US industrial production data from 1990-2014, including stationarity testing, differencing, ACF/PACF plots, ARIMA model fitting (with and without seasonal lags) using AIC/BIC, Ljung-Box test for residuals, and 12-month-ahead forecasting. Problem 2 analyzes monthly interest rates using Phillips-Ouliaris cointegration tests and the Johansen procedure to determine the number of cointegration relationships. Problem 3 uses a logit model to study how house features affect homeowner preferences, including model selection (full vs. stepwise), odd calculations, and probability calculations.

Problem 1.
In the dataset of “production.txt”, it has the monthly industrial production (IP) of US from year
1990 to year 2014. The data is retrieved from the Federal Reserve Bank of St. Louis.
(a) Plot the time series of IP, ACF and PACF of IP. Test the stationary of the IP.
The time series of the IP is shown below.
ACF of IP is shown below.
1
In the dataset of “production.txt”, it has the monthly industrial production (IP) of US from year
1990 to year 2014. The data is retrieved from the Federal Reserve Bank of St. Louis.
(a) Plot the time series of IP, ACF and PACF of IP. Test the stationary of the IP.
The time series of the IP is shown below.
ACF of IP is shown below.
1
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

PACF of the IP series is shown below.
In order to test the stationarity of the time series, Augmented-Dickey-Fuller test is used. The
alternate hypothesis is that the time series is stationary. As can be seen from the figure below the
p value is greater than 0.05 So the time series is not stationary.
(b) Generate the difference of IP. Plot the time series of differenced IP, ACF and PACF of
the differenced IP. Test the stationary of the differenced IP.
The time series of the differenced IP is shown below.
2
In order to test the stationarity of the time series, Augmented-Dickey-Fuller test is used. The
alternate hypothesis is that the time series is stationary. As can be seen from the figure below the
p value is greater than 0.05 So the time series is not stationary.
(b) Generate the difference of IP. Plot the time series of differenced IP, ACF and PACF of
the differenced IP. Test the stationary of the differenced IP.
The time series of the differenced IP is shown below.
2

The ACF of the differenced time series is shown below.
The PACF of the differenced time series is shown below.
The test for the stationarity is shown below. The Differenced Time Series is found to be
stationary.
(c) Are there seasonal pattern in the series of IP?
In order to detect the seasonal pattern in the series, the Fourier Transform Technique is used. The
spike in the graphs shows the presence of seasonality with different frequency.
3
The PACF of the differenced time series is shown below.
The test for the stationarity is shown below. The Differenced Time Series is found to be
stationary.
(c) Are there seasonal pattern in the series of IP?
In order to detect the seasonal pattern in the series, the Fourier Transform Technique is used. The
spike in the graphs shows the presence of seasonality with different frequency.
3
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

But the Frequency of the seasonality is very low. The top 2 frequencies are 0.0033 and 0.0066.
So these seasonality can easily be ignored.
(d) Use AIC or BIC to fit an appropriate ARIMA model for the time series. Fit an ARIMA
model with seasonal lag for the time series. Compare the two models.
The order of the ARIMA model is arrived by using the function auto.arima() in R. The Arima
model without seasonal lag is shown below.
The ARIMA model with the lag is shown below.
4
So these seasonality can easily be ignored.
(d) Use AIC or BIC to fit an appropriate ARIMA model for the time series. Fit an ARIMA
model with seasonal lag for the time series. Compare the two models.
The order of the ARIMA model is arrived by using the function auto.arima() in R. The Arima
model without seasonal lag is shown below.
The ARIMA model with the lag is shown below.
4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

(e) Use Ljung-Box test to evaluate the serial correlation of residuals.
The Ljung-Box test is shown in the figure below.
(f) Compute 12-months-ahead forecasts based on the fitted model of your choice.
The forecast for the 12-months ahead is shown in the figure below.
Problem 2.
The file “rates.txt” contains the monthly interest rates for eight different terms, including 1-year
rates, 2-year rates, 3-year rates, 4-year rates, 5-year rates, 7-year rates, 10-year rates, 30-year
5
The Ljung-Box test is shown in the figure below.
(f) Compute 12-months-ahead forecasts based on the fitted model of your choice.
The forecast for the 12-months ahead is shown in the figure below.
Problem 2.
The file “rates.txt” contains the monthly interest rates for eight different terms, including 1-year
rates, 2-year rates, 3-year rates, 4-year rates, 5-year rates, 7-year rates, 10-year rates, 30-year
5

rates. Use Phillips-Ouliaris Cointegration Test and Johansen-Procedure to analyze the co-
integration among the eight time series.
(a) Is each time series stationary?
The stationarity is tested using the Augmented Dickey Fuller test. The result for each time series
is shown below.
6
integration among the eight time series.
(a) Is each time series stationary?
The stationarity is tested using the Augmented Dickey Fuller test. The result for each time series
is shown below.
6
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

(b) Is one-year interest rates co-integrated with the other seven interest rates?
In order to test the co-integration between the one-year interest rates and the other seven interest
rates, one-year rates is regression against the other rates. If the coefficient is significantly
different than zero the series is co-integrated. Following is the result summary of the assignment.
From the above result, one-year rates is co-integrated with 2-year rate, 3-year rate, 10-year rate
and 30-year rates.
7
In order to test the co-integration between the one-year interest rates and the other seven interest
rates, one-year rates is regression against the other rates. If the coefficient is significantly
different than zero the series is co-integrated. Following is the result summary of the assignment.
From the above result, one-year rates is co-integrated with 2-year rate, 3-year rate, 10-year rate
and 30-year rates.
7
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

(c) Use Johansen-Procedure to find the number of co-integration among the eight time series.
Write down the error correction components for the co-integrated time series.
The Johansen-Procedure results is shown in the figure below.
As evident from the figure, number of co-integration is 3 with 95% confidence.
8
Write down the error correction components for the co-integrated time series.
The Johansen-Procedure results is shown in the figure below.
As evident from the figure, number of co-integration is 3 with 95% confidence.
8

Problem 3.
Using the dataset “HousePrices.csv”, we study how the preference of house owner is affected by
the features of a house such as price, lot size, number of bathroom, and so on. If the house is
preferred, the variable, “prefer” is a binary variable, taking the values of yes or no. Use a logit
model to the data and study the probability of a house preferred by a home owner. The dependent
variable is “prefer” and all the other variables are independent variables.
(a) Fit the data by a full model and use stepwise the select an optimal model. Display
summary of the model estimations. Compare the full model and the selected model.
Justify the selected model by the AIC criteria.
First of all the full model is built. The summary of the full model is shown below.
The AIC value of the full model is 513.64.
9
Using the dataset “HousePrices.csv”, we study how the preference of house owner is affected by
the features of a house such as price, lot size, number of bathroom, and so on. If the house is
preferred, the variable, “prefer” is a binary variable, taking the values of yes or no. Use a logit
model to the data and study the probability of a house preferred by a home owner. The dependent
variable is “prefer” and all the other variables are independent variables.
(a) Fit the data by a full model and use stepwise the select an optimal model. Display
summary of the model estimations. Compare the full model and the selected model.
Justify the selected model by the AIC criteria.
First of all the full model is built. The summary of the full model is shown below.
The AIC value of the full model is 513.64.
9
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Then using the backward selection techniques, optimal number of variable is selected. The
summary of the optimal model is shown below.
As can be seen from the above figure, the step_model(model after the backward elimination
technique) is giving a better result as the AIC value of this model is lower than the full model.
(b) In the selected model, calculate the odd of preference to non-preference if there is a
driveway given that all the other variables have no change. Calculate the probability of
preference by the home owner if all continuous variables take the mean value and dummy
variables take the value of one.
In the selected model, the odd of preference to non-preference if there is a driveway, given all
the other variable have no change is given by the coefficient of the variable Driveway in the
model which is equal to 1.956.
The probability of preference by the home owner if all continuous variables take the mean value
and dummy variables take the value of one is 0.9896
10
summary of the optimal model is shown below.
As can be seen from the above figure, the step_model(model after the backward elimination
technique) is giving a better result as the AIC value of this model is lower than the full model.
(b) In the selected model, calculate the odd of preference to non-preference if there is a
driveway given that all the other variables have no change. Calculate the probability of
preference by the home owner if all continuous variables take the mean value and dummy
variables take the value of one.
In the selected model, the odd of preference to non-preference if there is a driveway, given all
the other variable have no change is given by the coefficient of the variable Driveway in the
model which is equal to 1.956.
The probability of preference by the home owner if all continuous variables take the mean value
and dummy variables take the value of one is 0.9896
10
1 out of 10
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.