ECO-6003A - Econometrics and Data Analysis: Stata Panel Data

Verified

Added on  2023/05/28

|21
|1898
|159
Homework Assignment
AI Summary
This assignment delves into econometrics data analysis techniques using Stata, focusing on panel data and treatment effects. It includes various regression analyses, interpretations of statistical significance, and discussions on model fit. The assignment also covers the use of the xtdes command for describing panel participation patterns, addresses potential issues like heteroskedasticity and serial correlation, and examines the impact of variables like commuting time and mode of transport on wellbeing. The analysis extends to exploring the relationship between age and wellbeing, and considers the influence of factors like Brexit on individual satisfaction. The document provides detailed Stata outputs and interpretations to support its findings. Desklib offers a platform for students to access similar solved assignments and past papers for academic assistance.
Document Page
Data Analysis Techniques 1
ECONOMETRICS: DATA ANALYSIS TECHNIQUES
By (Name)
The Name of the Class (Course)
Professor (Tutor)
The Name of the School (University)
The City and State where it is located
The Date
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data Analysis Techniques 2
Econometrics: Data Analysis Techniques
Question 1
Part a
0 5,000 10,000
Number of People
0 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536
Bar Graph for ghq
0 10 ,00 0 20 ,00 0 30 ,00 0 40 ,00 0 50 ,00 0
N u m be r of P e o p le
1 2 3 4 5 6 7
Bar Graph for Wellbeing
Document Page
Data Analysis Techniques 3
The graph for wellbeing is significantly skewed to the right indicating a majority of
individuals are considerably satisfied with their lives overall. The bar graph for ghq is skewed to
the left indicating that a significant proportion of individuals are only slightly distressed. The two
graphs prove that most of the people in the population sample are very happy and content with
their lives. From the histogram below it is clear that the data for Min_Travel time is negatively
skewed indicating that a majority of people take a relatively short time to travel to their place of
employment. Based on the results indicated in Table 1 (Appendix), the most popular mode of
transportation is Cars. Cars together with other mode both account for most of the instances
where individuals take a considerably longer time to reach their workplace. Taxi is highly used
for short travels only as indicated by the time it takes individuals to reach their places of
employment. Lastly, walking and cars are the most used modes of transportations for short
duration trips.
0 10 20 30 40
Percent
0 200 400 600 800 1000
minutes spent travelling to work
Histogram For Min_Travel
Document Page
Data Analysis Techniques 4
Part b
Regression Output for Wave 2
Regression Output for Wave 4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data Analysis Techniques 5
Part c
Wave 2 model fits the data associated with it considerably more compared to Wave 7 model.
Nevertheless, for both models their ability to fit the data is not considerably good, and less than
50% of the change in the dependent variable "Wellbeing" can be explained by the independent
variables. Both models are statistically significant given the p-values are less than alpha 0.05 and
0.01. With regard to the significance of the individual variables in each model; Wave 2 model
had only two statistical insignificant variables while Wave 7 model has four statistically
insignificant variables. It is interesting to indicate that the signs of the variable coefficients have
not change from one model to the other i.e. they are the same for all coefficients.
Part d
Regression for Wave 2
Document Page
Data Analysis Techniques 6
Regression for Wave 7
There was a change in the sign value of the coefficients comparing the models in Part d and Part
c, this is because of the difference in impact or influence caused by these independent variables
on the dependent variables "Wellbeing" and "ghq".
Part e
The Stata command xtdes was used to aid in the description of participation pattern
associated with the panel. According to the table below we have 33,809 participants in the
survey. The maximum number of waves across which all participants are covered is exactly 6.
The most common pattern of participation is records in the first wave 2 (i.e. value 1 in Stata)
with 15.17% or 5189 individuals. The final line on the pattern column indicates the total for
participation pattern not observed. It is clear for the distribution of Ti row that 25% of all
participants surveyed belong to wave 2. As such, the number of participants evaluated in each
Document Page
Data Analysis Techniques 7
wave is not even; with some waves having considerably higher numbers of participants
compared to others.
Part f
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data Analysis Techniques 8
There was the omission of the variable males in this model due to collinearity. The variable
"males" was included in both models found in part b. This pooled model has two major
advantages over the cross-sectional models generate in part b; the fact that it is sensitive to
effects of collinearity and allows for the assessment of goodness of fit between & within groups.
Moreover, the pooled model allows for the estimation of values for sigma μ and sigma e.
Part g
There isn't much difference given that the two models look alike. Nevertheless, it is important to
note that the significance values of the various independent variables across the two models
differ. For example in the unbalanced model (part f) the variable Job_hours is insignificant in the
Document Page
Data Analysis Techniques 9
model; however, the same variable in the balanced model above is statistically significant to the
model showcased.
Part h
(i)SE
(ii)CSE
Document Page
Data Analysis Techniques 10
Part i
I would not expect any issue associated with heteroskadicity for μi; however, I would express
some concerns for serial correlation and heteroskadicity for ϵ¿.
Part j
No, there were no noticeable differences in the coefficient (the difference was largely noticeable
in the f-score and p-values). Since, there are no notable differences we can conclude that there is
no association between the explanatory variables and μi. As such, no factors can be identified
that cause there to be any association between explanatory variables and μi.
Part k
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data Analysis Techniques 11
The results are provided under Table 2 in appendix. It is clear that the variable is time
persistence.
Park l
(i) Dummy Car Travel omitted
rho .52084457 (fraction of variance due to u_i)
sigma_e 1.1080382
sigma_u 1.1552357
_cons 5.510602 .2185527 25.21 0.000 5.082231 5.938973
other .0285693 .0563402 0.51 0.612 -.0818594 .138998
walk .0057496 .0248539 0.23 0.817 -.0429648 .054464
cycle .0548206 .0370596 1.48 0.139 -.0178175 .1274587
underground .0497685 .0510452 0.97 0.330 -.0502819 .1498189
train -.0539365 .0345852 -1.56 0.119 -.1217247 .0138517
bus -.0291045 .0316441 -0.92 0.358 -.091128 .032919
taxi -.243475 .1086569 -2.24 0.025 -.4564463 -.0305038
motorcycle -.0373566 .0735797 -0.51 0.612 -.1815754 .1068622
lift .0204079 .0283568 0.72 0.472 -.0351724 .0759882
mode
marryco .1797733 .0266759 6.74 0.000 .1274876 .2320591
min_travel -.0002236 .0002454 -0.91 0.362 -.0007047 .0002575
job_hours -.0012261 .000878 -1.40 0.163 -.0029471 .0004948
tertiary .0389995 .0445855 0.87 0.382 -.0483896 .1263886
secondary .0299932 .038164 0.79 0.432 -.0448095 .1047959
male 0 (omitted)
children .0038997 .0128801 0.30 0.762 -.0213458 .0291453
age2 .0003846 .000118 3.26 0.001 .0001533 .0006159
age -.0346948 .0103347 -3.36 0.001 -.0549512 -.0144385
lnlabinc .0406283 .0134211 3.03 0.002 .0143224 .0669342
wellbeing Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
(Std. Err. adjusted for 33,809 clusters in pidp)
corr(u_i, Xb) = 0.0932 Prob > F = 0.0000
F(18,33808) = 4.42
overall = 0.0186 max = 6
between = 0.0271 avg = 3.1
within = 0.0012 min = 1
R-sq: Obs per group:
Group variable: pidp Number of groups = 33,809
Fixed-effects (within) regression Number of obs = 105,458
note: male omitted because of collinearity
Document Page
Data Analysis Techniques 12
(ii) Interaction mode
This result failed to generate due too many row and column associated with the variable
min_travel. This was caused by outliners in the data that result in a small proportion of the
surveyed group taking longer than 120 minutes to get to work.
(iii) Relationship
Commuting time has a negative relationship with wellbeing while mode of transport has positive
relationship with wellbeing. As the commuting duration increases the individual realizes a
reduction in well-being.
(iv) Statistically significant
The mode of transport variables are not statistically significant at alpha =5%, with the exception
of taxi which has a p-value of 0.025. Therefore, all the dummy variables and their interaction
will assume values of zeros in the mode with the exception of taxis.
Part m
The relationship is different due to the change in data distribution. Therefore an inverse effect
has been created; as such, the signs on the model coefficients have changed. If in part (i) the
effect was positive in this segment the effect becomes negative. There is clearly an inverse U
relationship created by the recoding of values.
Question 2
Part a
chevron_up_icon
1 out of 21
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]