BUSM112 Econometrics Assignment: Statistical Analysis and Methods
VerifiedAdded on 2022/09/16
|7
|1576
|20
Homework Assignment
AI Summary
This assignment solution addresses key concepts in econometrics, providing detailed answers to questions on OLS regression, propensity score matching, instrumental variables, and time series analysis. The solution begins by analyzing the statistical significance of capital and labor in a production function, interpreting coefficients, and assessing the explanatory power of the model using R-squared. It then explores propensity score matching, comparing it to covariate matching and explaining its advantages. The assignment further delves into difference-in-difference analysis, outlining how to measure the impact of a micro-finance program and construct a regression equation. The solution also covers instrumental variables, explaining their role in correcting endogeneity and identifying the characteristics of a good instrument. Finally, it examines time series stationarity, the Dickey-Fuller test, and provides a practical application of these concepts to an exchange rate series.

1MOCK EXAMS
Mock Exams
Name
Institutional Affiliation
Date
Mock Exams
Name
Institutional Affiliation
Date
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

2MOCK EXAMS
Question 1
Part a
Capital and labour are statistically significant at 100% significance level. The unit
effects on the output (the independent variable) are 0.5212795 and 0.4683318
respectively for capital and labour.
Part b
The co-efficient 0.4683318 in the model implies that an increase in the workers hours
by one unit increases the production output by 0.4683318 units and vise-verse.
Part c
R-square statistic shows the percentage of the variance in the dependent variable that
the model can collectively explain.
From the data given, the value of R-squared 0.9642, imply that holding all the other
factors constant, the model can only account for 96.42% of the variance of output
(dependent variable)
The R-squared in this case shows the strength of the relationship that exist between
the production function and the factors of production, labour and capital in a
convenient scale of 0-100%.
Part d
I would test multicollinearity through a bivariate correlation between the two
predictor variables. A high correlation between the two predictors shows
multicollinearity. If the bivariate correlation is a above, 0.7, it is an indication that
there is multicollinearity. That is, there is an overlap in the predictor variables in
terms of what they measure making it difficult to distinguish their effects.
Question 2.
Part a
PS matching is the regression that should be used to estimate propensity scores. In
this regression, every treated patient is allocate single untreated patient, 1:1 matching,
Question 1
Part a
Capital and labour are statistically significant at 100% significance level. The unit
effects on the output (the independent variable) are 0.5212795 and 0.4683318
respectively for capital and labour.
Part b
The co-efficient 0.4683318 in the model implies that an increase in the workers hours
by one unit increases the production output by 0.4683318 units and vise-verse.
Part c
R-square statistic shows the percentage of the variance in the dependent variable that
the model can collectively explain.
From the data given, the value of R-squared 0.9642, imply that holding all the other
factors constant, the model can only account for 96.42% of the variance of output
(dependent variable)
The R-squared in this case shows the strength of the relationship that exist between
the production function and the factors of production, labour and capital in a
convenient scale of 0-100%.
Part d
I would test multicollinearity through a bivariate correlation between the two
predictor variables. A high correlation between the two predictors shows
multicollinearity. If the bivariate correlation is a above, 0.7, it is an indication that
there is multicollinearity. That is, there is an overlap in the predictor variables in
terms of what they measure making it difficult to distinguish their effects.
Question 2.
Part a
PS matching is the regression that should be used to estimate propensity scores. In
this regression, every treated patient is allocate single untreated patient, 1:1 matching,

3MOCK EXAMS
or more than one untreated patient , in the ratio 1:n matching, with similar PS or one
that only has a slight difference within some defined limits. The dependent variable in
this regression is the treatment effect which is estimated in the matching population.
PS matching would be the preferred procedure because of its ability to give an explicit
display of the recorded characteristics of both the treated and untreated patients.
Part b
Co-variate matching
It involves systematic selection of the participants to various balance groups, on select
variables. The number of confounding variables in this matching is relatively smaller.
When there is an increase in the the number of confounding variables, there is a
corresponding exponential increase in the sample size required. Every additional con-
founder adds a group to the design of the study. This type of matching has a a major
practical limitation that every potential participant must be available for the both the
groups (Stuart 2018). In some cases, as the requirement in the sample size increases, it
becomes difficult to find the right number of study participants making the whole
process difficult (Stuart 2018).
Propensity matching
It easier to match on multiple confounding variables through propensity scores. This
matching method avoids the limitations of co- variate matching methods since it
requires single propensity score to select a comparison group (Stuart 2018). The
propensity is represented by a group of variable (confounders) with a single number.
Usually, the propensity score is a number between zero and one representing the
predicted probability that given the confounders, a person must be in a particular
group. The matching is done using the logistic regression where the outcome variable
is represents a group. The predictor variables are the con-founders.
Question 3
Part a
or more than one untreated patient , in the ratio 1:n matching, with similar PS or one
that only has a slight difference within some defined limits. The dependent variable in
this regression is the treatment effect which is estimated in the matching population.
PS matching would be the preferred procedure because of its ability to give an explicit
display of the recorded characteristics of both the treated and untreated patients.
Part b
Co-variate matching
It involves systematic selection of the participants to various balance groups, on select
variables. The number of confounding variables in this matching is relatively smaller.
When there is an increase in the the number of confounding variables, there is a
corresponding exponential increase in the sample size required. Every additional con-
founder adds a group to the design of the study. This type of matching has a a major
practical limitation that every potential participant must be available for the both the
groups (Stuart 2018). In some cases, as the requirement in the sample size increases, it
becomes difficult to find the right number of study participants making the whole
process difficult (Stuart 2018).
Propensity matching
It easier to match on multiple confounding variables through propensity scores. This
matching method avoids the limitations of co- variate matching methods since it
requires single propensity score to select a comparison group (Stuart 2018). The
propensity is represented by a group of variable (confounders) with a single number.
Usually, the propensity score is a number between zero and one representing the
predicted probability that given the confounders, a person must be in a particular
group. The matching is done using the logistic regression where the outcome variable
is represents a group. The predictor variables are the con-founders.
Question 3
Part a
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

4MOCK EXAMS
Pre-
intervention
Post-
intervention
Difference
Non-
participant
100 90 -10(Post-intervention-pre-
intervention)
participants 120 150 30(post-intervention-pre-
intervention)
Difference -20 -60 -40
From the calculations in the table:
The net impact for the non-participant is -10, implying that a small increase in the
program will reduce their income by 10 thousand USD.
The net impact of the micro-finance program on the participant is 30, implying that a
small increase in the program will increase the income of the participants 30 thousand
us dollars.
Part b
I would need propensity score matching method. This is a technique that will help me
measure the effect of the independent variables by using a logistic regression, with
categorical dependent variable Y, where Y is assigned a value 1, if participate and
Y=0 if don’t participate. I would run logistic function; F(t)=1/(1-exp(-t) and t will be a
linear function in the form; (2)t= a+bx. I would then use the difference-in-difference
estimators 2*2 table.
Part c
Regression equation would be; Y (ist) = gamma(s) + lambda(t) + delta(D(st)) + error.
In the regression equation, Y (ist) = independent variable of i term creted by s = state
and t = time; gamma(s) = vertical intercept of s state and lamda(t) = vertical intercept
of t; delta = treatment effect; and D(st) = dummy variable
Question 4
Pre-
intervention
Post-
intervention
Difference
Non-
participant
100 90 -10(Post-intervention-pre-
intervention)
participants 120 150 30(post-intervention-pre-
intervention)
Difference -20 -60 -40
From the calculations in the table:
The net impact for the non-participant is -10, implying that a small increase in the
program will reduce their income by 10 thousand USD.
The net impact of the micro-finance program on the participant is 30, implying that a
small increase in the program will increase the income of the participants 30 thousand
us dollars.
Part b
I would need propensity score matching method. This is a technique that will help me
measure the effect of the independent variables by using a logistic regression, with
categorical dependent variable Y, where Y is assigned a value 1, if participate and
Y=0 if don’t participate. I would run logistic function; F(t)=1/(1-exp(-t) and t will be a
linear function in the form; (2)t= a+bx. I would then use the difference-in-difference
estimators 2*2 table.
Part c
Regression equation would be; Y (ist) = gamma(s) + lambda(t) + delta(D(st)) + error.
In the regression equation, Y (ist) = independent variable of i term creted by s = state
and t = time; gamma(s) = vertical intercept of s state and lamda(t) = vertical intercept
of t; delta = treatment effect; and D(st) = dummy variable
Question 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

5MOCK EXAMS
Part a
The instrumental variables focuses on the variations in the independent variables,
uncorrelated with error term while disregarding the variation in the independent
variable that bias the ordinary least square coefficient. Considering a structural
equation with a single endogenous variable, where Y represents the ith observation on
the dependent variable, X is the ith observation on the independent variable and w is
the the ith observation of the r exogenous regressions( also known as the co-variates)
while u is the error term. The instrumental variable corrects endogeneity by isolating
the variation in the independent variable,X, that does not correlate with the error term
in the first phase. The resulting independent variable value is used in the second phase
instead of the original value of the independent variable.
Part b
1. A good instrument should be uncorrelated with error term (u). This characteristic
emphasize the exo-geneity condition, which states that the instrument used in
correcting endogenous can only correlate with the dependent variable indirectly,
through an endogenous variable (Rutkowski & Zhou 2015).
2. A good instrument should be correlated with the endogenous variables. This feature
repeats the instrument relevance condition which states that there must be sufficient
variation the predicted value,say Y that the predictor value say y may not
explain(Rutkowski & Zhou 2015).
Part c
The Heckman-two step procedure can be used to assess the validity of the instruments
used in correcting endogeneity. The first step is referred to the selection equation
where the probability of getting diversified is analyzed using a probit model. The
dependent variable in this step of the test is is a dummy variable assuming the value
one. The purpose of the selection equation is to compute the correction factor known
as the inverse Mills ratio. The second step involves an outcome equation which is the
outcome of interest, simulated with an ordinary least square plus the correction factor.
The outcome equation test an evidence of self-selection bias thus testing the validity
of the instruments used and whether some assumptions are met
Part a
The instrumental variables focuses on the variations in the independent variables,
uncorrelated with error term while disregarding the variation in the independent
variable that bias the ordinary least square coefficient. Considering a structural
equation with a single endogenous variable, where Y represents the ith observation on
the dependent variable, X is the ith observation on the independent variable and w is
the the ith observation of the r exogenous regressions( also known as the co-variates)
while u is the error term. The instrumental variable corrects endogeneity by isolating
the variation in the independent variable,X, that does not correlate with the error term
in the first phase. The resulting independent variable value is used in the second phase
instead of the original value of the independent variable.
Part b
1. A good instrument should be uncorrelated with error term (u). This characteristic
emphasize the exo-geneity condition, which states that the instrument used in
correcting endogenous can only correlate with the dependent variable indirectly,
through an endogenous variable (Rutkowski & Zhou 2015).
2. A good instrument should be correlated with the endogenous variables. This feature
repeats the instrument relevance condition which states that there must be sufficient
variation the predicted value,say Y that the predictor value say y may not
explain(Rutkowski & Zhou 2015).
Part c
The Heckman-two step procedure can be used to assess the validity of the instruments
used in correcting endogeneity. The first step is referred to the selection equation
where the probability of getting diversified is analyzed using a probit model. The
dependent variable in this step of the test is is a dummy variable assuming the value
one. The purpose of the selection equation is to compute the correction factor known
as the inverse Mills ratio. The second step involves an outcome equation which is the
outcome of interest, simulated with an ordinary least square plus the correction factor.
The outcome equation test an evidence of self-selection bias thus testing the validity
of the instruments used and whether some assumptions are met

6MOCK EXAMS
Question 5
Part a
A time series is said to be stationary if its mean, variance and auto covariance are
constant over time. The auto-correlation function presented, has a constant mean
(mean reverting), has a constant variance, has a constant covariance, which only
depends on the time lag, the function has a constant skewness and constant kurtosis,
no unit root, which means that it is integrated of order zero, that is I (0). The function
has no trend. A trend, that is, an upward movement or a downward movement over
time. Finally, the function has a transitory innovations rather than permanent
innovations (that is, a shock to a stationary series will die away soon. Hence the
exchange rate series is stationary.
Part b
The Dickey Fuller test is a test for stationarity that finds out whether a time series
contains a unit root or not. If a time series contains a unit root, then it is said to be
non-stationary. The criterion is that if the DF test statistic is greater in absolute terms
than the critical value, we ascertain that a series has a unit root.
In our case, absolute statistic 1.569 is less than critical values at all levels of
significance. Thus, we cannot ascertain that the time series has a unit root. In
conclusion, the exchange series has no unit root hence it is stationary.
Question 5
Part a
A time series is said to be stationary if its mean, variance and auto covariance are
constant over time. The auto-correlation function presented, has a constant mean
(mean reverting), has a constant variance, has a constant covariance, which only
depends on the time lag, the function has a constant skewness and constant kurtosis,
no unit root, which means that it is integrated of order zero, that is I (0). The function
has no trend. A trend, that is, an upward movement or a downward movement over
time. Finally, the function has a transitory innovations rather than permanent
innovations (that is, a shock to a stationary series will die away soon. Hence the
exchange rate series is stationary.
Part b
The Dickey Fuller test is a test for stationarity that finds out whether a time series
contains a unit root or not. If a time series contains a unit root, then it is said to be
non-stationary. The criterion is that if the DF test statistic is greater in absolute terms
than the critical value, we ascertain that a series has a unit root.
In our case, absolute statistic 1.569 is less than critical values at all levels of
significance. Thus, we cannot ascertain that the time series has a unit root. In
conclusion, the exchange series has no unit root hence it is stationary.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

7MOCK EXAMS
References
Rutkowski, L., & Zhou, Y. (2015). Correcting measurement error in latent regression
Covariates via the MC-SIMEX method. Journal of Educational Measurement,
52(4), 359-375.
Stuart, E. A. (2018). Propensity scores and matching methods. The Reviewer’s Guide
to Quantitative Methods in the Social Sciences, 388-396.
References
Rutkowski, L., & Zhou, Y. (2015). Correcting measurement error in latent regression
Covariates via the MC-SIMEX method. Journal of Educational Measurement,
52(4), 359-375.
Stuart, E. A. (2018). Propensity scores and matching methods. The Reviewer’s Guide
to Quantitative Methods in the Social Sciences, 388-396.
1 out of 7
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2026 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.