Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support

Regression Analysis of Job Income

Verified

Added on 2020/02/05

AI Summary

This assignment presents a regression analysis examining the factors influencing job income (jobinc). The output includes a table of residuals statistics, providing insights into the minimum, maximum, mean, standard deviation, and number of observations. It also details predicted values, adjusted predicted values, and various error measures like standard error, studentized residual, and deleted residual. Additionally, the assignment explores leverage and Cook's distance to assess the influence of individual data points on the model.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

SPSS

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

TABLE OF CONTENTS
INTRODUCTION................................................................................................................................1
PART 1A..............................................................................................................................................1
1B.........................................................................................................................................................2
MEASUEMENT OF CENTRAL TENDANCY..................................................................................2
1C.........................................................................................................................................................4
ASSUMPTIONS..................................................................................................................................5
ONE WAY ANOVA............................................................................................................................5
Test of Homogeneity of Variances.......................................................................................................6
PART 2 REGRESSION.......................................................................................................................6
REFERENCE.......................................................................................................................................9
APPENDIX........................................................................................................................................10

INTRODUCTION
This report will investigate whether the wealth of KDS customers and whether or not that
the credit they have applied is granted. A survey of self-completion questionnaire was employed.
The questionnaire contained personal details such as age, years of employment, type of
occupation and level of education and many more. Further, the report comprises of two parts in
which first part analyzes and interprets the graphs of SPSS software output of Job income. It will
also comment on summery table and it will give detail about mean, medium and variance,
skewness and kurtosis. Thereafter, the report will run one-way ANOVA, post hoc analysis and t-
test in order to analyze if there is a significant difference of two or more than two population
means such as job status.
PART 1A
Graph 1, 2 and appendix 1 clearly indicates that the outcome of job income is positively
skewed. According to Field (2015), “there are two main ways in which a distribution can deviate
from normality”. The lack of symmetry which is known as skew is one and pointiness which is
known kurtosis is the second category. Both these categories are not normally distributed for job
income which further supports the positively skewed graphs (see graph 1 and 2). This suggests
that graphs are non-symmetrical therefore they are not normally distributed. Furthermore, table
one show that the skewness and kurtosis of 2.085 and 6.175 respectively are above the norm1
which is in-line with the graphs that indicated positively skewness and lack of normality. The
skewness of 2.085 indicates that the distribution is highly skewed (see graph one). Therefore,
some outliers need to be deleted to make at least moderate skewed. Furthermore, table 1
indicates that the majority of the group in the statistics is lower bound. Most groups that applied
the credit people who applied are on the low-income scale and just few are from higher-income
band the graphs indicate skewed distribution that indicates and suggest that the mean is farther
out in the long tail than is the median (Moore and McCabe 2003, p. 43)
GRAPH 1:
1Norm= The skewness and kurtosis norm are between -1 and 1 for skewness and between -4 and 4 for
kurtosis
1

GRAPH 2:
1B
MEASUEMENT OF CENTRAL TENDANCY
MEAN:
The mean of Job income is 994.8868. This indicates that the average income of the
samples of 106 cases is 994 (see table 1). Furthermore, the upper bound mean is 1109 whereas
the lower bound has a mean of 879. This is not relatively lucrative income. Job-income appeared
2

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

to be positively skewed distributed which implies that the mean lie towards the direction of skew
(the longer tail) relative to the median (Agresti and Finlay 1997, p. 50)
MEDIAN:
The median number that is separating the higher half of a job income data sample of
population from the lower half is 800. This further supports that it is not close to above
mentioned mean of close to 1000. That further supports the idea of non-normality distribution.
VARIANCE:
The measurement of distribution for job-income from the mean is 356703. This
Variance of measurement appears to be to be very high and one explanation could be high
number of outliers. This can be seen at Appendix 2. As the outliers, the variance came down
which indicates that it is in-line with the normal distribution. For instance, by the time the last
outlier of 46 is removed the variance goes down from 356,703 to 127, 220. That is staggering
64.33 % decrease. Still this indicates high variance from the mean.
STANDARD DEVIATION:
The standard deviation of Job income went from 597 before any outlier is deleted to 356
after all outliers is deleted. That is 40% decrease. Similarly, the variance correlates with outliers
and as the outliers are eliminated the standard deviation decreases. This is again in-line with
normality distribution as appendix 2 indicates.
Basic statistics are given below. It shows that there is considerable variation and
difference between the mean and medium value with high variance. The result also demonstrates
significant deviation from normality by giving the significant high level of positively skew. This
can be seen the skew result of 2.085.
As the descriptive (table 2) indicates that the Skewness is 2.085 which are above the
norm of between 1 and -1 as mentioned before. That further justifies the underlying principles of
removing outliers such as 26. Similarly, the Kurtosis is more than the norm of 4 which further
illustrates and supports the reason for deleting some outliers to make the distribution normal or
close to normality.
Using outlier labelling rule model if the g = 1.5 there are some outliers Tukey (1977).
With that in mind, the skewness and kurtosis are above the norm (-1 to 1 and -4 and 4) after the
first result it is suggested that some outliers should be removed to get to normality. The outliers
are 26, 56 and 10. Therefore, outliers are deleted one by one. The result indicates that the
3

skewness and kurtosis are going down but after deleting the first three that is 26, 56 and 10 the
kurtosis appear to be within the range but skewness is still slightly outside the range. Therefore,
further two outliers 60 and 72 are removed (see graph 2).
However, after removing 60, the result was reducing and it appears that there were only
one more outlier (72) (see appendix). Furthermore, after deleting 72 it created another 5 outliers
therefore further reduction occurred until the skewness goes below one. So the further outliers
such as (21, 4, 33, 38 and 46) were deleted before both the skewness and the kurtosis became
normal. However, the kurtosis was already within the norm by the time the third outlier 10 was
deleted. Furthermore, after 46 was deleted the Skewness and Kurtosis become 956 and 567
respectively which is well within the norm. After 46, no need to delete anymore outliers.
1C
A sample of 106 applicants is chosen to make inference about the population. These
inferences are based on mutually exclusive statement (Hypothesis). In order to determine this
inference a test is used to give basis for groups that are granted credit and those who are not
granted (see table 3). A sample of 96 was taken from the groups whether there is a significant
difference between the two groups
The result indicates that those who got credit appeared to have higher income than
unsuccessful once. Therefore, hypothesis testing is done to support or deny this claim.
 Ho: there is no significant difference in job-income mean between those who get credit
and those who did not
 Hi: Those who got credit got higher income.
It is worth knowing that if for-instance the Hi stands for there is a significant different
there is two tail (higher income or lower income). However, for simplicity the job-income T-test
will examine one tail test which is higher income. However, monthly incomes of those who get
credit and one who are unsuccessful are different. The successful once got higher income (see
table 3). This can be seen that the higher mean income of who granted credit compares to
unsuccessful one. The mean for successful and unsuccessful is 942 and 756 respectively.
As per table 3 and 4 in Appendix proved and based on the given rationale/parameters
there is a significant difference. Mean difference can be seen in the table 3. Therefore, we retain
the null hypothesis. This is because the alternative actual mean is currently 756 which are below
4

the null hypothesis (942). It can be said that, there is significant evidence that those who granted
the credit appeared to have higher income then than the other group.
Α= 5%
The significant associated with test of equality of variance is 0.013 and consequently, the
null hypothesis is accepted. In contrast, equally variance assumed line which gives 0 significant
is rejected. We reject the alternative since it is lower than 5% alpha. Because it is independent
now the test of variances is needed. The above table indicates that those who got credit and
those who did not get credit have higher difference in income. Since, now twe can change the
means of groups and the alternative. We reject not assume because it is lower than 5% which is
our level of acceptance. Furthermore, they are not homogeneous.
ASSUMPTIONS
 Normality of errors
 Homoscedasticity
 Independence of errors
 Linearity
According to Field (2015), the first head of the beast of bias is outliers. There appear to
be division of opinion regarding assumption of normality. Some believe that it is the mean that
your data need to be normally distributed. However, Field (2015) disagree that and believes that
assumption of normality might mean different things in different contexts. For example, the
confident intervals around a parameter estimate such as mean to be accurate, that estimate must
come from a normal distribution. Another angle is the significance test of models and the
parameter estimates which define them to be accurate the sampling distribution of what’s being
tested must be normal. However, the central limit theorem means that there are a variety of
situations in which we can assume normality regardless of the shape of our sample data (Lumley,
Diehr, Emersen, & Chen, 2002). However, If the attention is only to estimate the parameters of
the model in the sample like job-income, then homogeneity of variance in most cases is not
important or at-least not relevant. The method of least squares will produce unbiased estimates
(Hayes and Cai, 2007).
ONE WAY ANOVA
H1: There is no significant difference between any two means
5

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Ho; At least there is a significant difference of two means
Within groups should be same as the table below proves otherwise they are not the same
group. Between groups should be large because they are from different group (see ANOVA table
10).
The significant is 0 and that means we reject the NUL hypothesis. This is simply the fact
that we have more than two mean test post hoc test is needed to identify these differences.
However, before post hoc is carried out so we need to test homogeneity of variance. Because we
have more than to means and their significant is 0, we reject the Null hypothesis above.
However, If Anova is simply a linear model than all of the potential sources of bias
shown below will be applied (Field, 2015).Therefore in order to reduce parsimony numbers
should not affect each other otherwise it will not be linear
 Additivity and Linearity
 Normality of something or others
 Homoscedasticity or homogeneity of variance
 Independence
TEST OF HOMOGENEITY OF VARIANCES
The output from the scheffe, Tukey and Gabriel shows that there are three different
groups. Group one comprises others and group second comprises of supervisory and group three
comprises of management. The result indicates that management got the highest income and
supervisory is second whereas others are the lowest. Therefore, our assumption that management
got higher income is confirmed. This result indicates that there significance is more than 10%,
therefore the report retained the result. It appears that management and supervisors show higher
income based on Gabriel, Scheffe and Tukey. They appeared to be good group since you cannot
break the group. However, regular check in needed in the future since their sample is different.
PART 2 REGRESSION
According to Filed and Hole (2003), “more often than not the manipulation of the
independent variable involves having an experimental condition and control group”. With that
in-mind, regression was used in this report to determine the statistical significance of class, sex,
age, job years, education Marital status, management and supervisors. Further, test was done to
determine the strength of association between job income and dependent variables and above
6

mentioned 8 independent variables. In addition, regression identified the relative importance of
each of the independent variables in predicting the single metric dependent variable of job-
income. For instance, as table 11 indicates job years, m-status and class are not relatively
important as compared to rest of the variables. That supports the reason they were eliminated in
later parts regression test, hence created reduced model. That proved that, the regression test
enabled to develop best model to explain job-income.
In addition to the above, this section will carry out multiple linear regressions with eight
independent variables assuming level of significance is 5%. The result of table 12 indicates that
R square value of around 81/5 which is reasonable. In addition to that, adjusted R square is very
close to R square especially when the non-influential variables such as class, marital status and
job years are removed. That said no concern regarding over fitting.
First model that fits:
Y ( jobinc)=bo +b 1 ( class ) +b 2 ( sex ) +b 3 ( age ) +b 4 ( job years ) + b 5 ( edu ) +b 6 ( mar status ) + b 7 ( managment ) + b 8 ( Su
The above two tables indicate that by deleting three non-influential independent variables
the model fit shown below is better that the original one that has been created.
From the output of ANOVA (Table 13 in Appendix) it can be seen that we reject the null
hypothesis and conclude that the R square is greater than zero or not all regression coefficient are
zero.
As the coefficient table below indicates that there are 5 independent variables that are
higher than 5% hence they are not significant to the job income. Since these are higher than 5%
deleting them will not affect the dependable variables of job-income as there is no correlation
between Y variables (job-income) and multiple independent variables (i.e job years, marital
status, class, age and sex), Therefore, the report deleted one by one depending on the higher
away they are from the 4%. For example, Job years is 0.747, marital status is 0.338, whereas
class is 0.274 and age is 0.230. Finally, sex is 0.021. This indicates that above are not relevant to
the job income. Similarly, it states that job status and education are the most relevant to the job
income or how much one can earn per month.
As (table 7 in Appendix) indicates that Class, age, job years and marital status are not
relevant to the job income since their significant is above the 5% norm. Therefore, job years is
eliminated first since it is the highest (74%). Then the marital status was removed second as it
7

now the highest of the remaining variables (34%). Finally, class which is (24) is also eliminated
since it is also higher than 5%. In-line with significant result, the standardized coefficients beta
of the removed variables are all less than the once that obtained. That is another reason for their
eliminations.
Y ( job income ) =bo+b 1 ( sex ) +b 2 ( age ) +b 3 ( educ ) +b 4 ( man ) + b5 (¿)
Effect size appeared to assist to determine which meaningful is more significant so job status has
large effect on income even though the management and supervisors significant is zero.
Based on coefficients (table 14) below there are 4 cases where there is significant. They
said items such as Sex, Education, Management and supervisors or job- status can influence your
job income. However, after removing only job years the age falls within the range. It falls from
23% to just 5% after removing job years. Furthermore, this improves the relationship between
age and job years. If any of the assumptions are not true usually referred to as a violation, then
the test static and p Values will be inaccurate and could lead us to the wrong conclusion if we
interpret them at the face value.
8

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

REFERENCE
Beyer, H. "Tukey, John W.: Exploratory Data Analysis. Addison-Wesley Publishing Company
Reading, Mass. — Menlo Park, Cal., London, Amsterdam, Don Mills, Ontario, Sydney
1977, XVI, 688 S.". Biom. J. 23.4 (1981): 413-414. Web. 22 Feb. 2016.
Decoster, Galluci, & Iselin, 2011; Decoster, Iselin, &Gallucci, 2009; MacCallum, et al., 2002.
Hippel, P, T, V., (2005) Amstat.org,. "Journal of Statistics Education, V13n2: Paul T. Von
Hippel". N.p., 2016. Web. 28 Feb. 2016.
Matthews, Patrick. "Median, Mode, Skewness, And Kurtosis In MS Access". Experts-
exchange.com. N.p., 2016. Web. 28 Feb. 2016.
9

APPENDIX
Table 1:
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
Jobinc 106 93.8% 7 6.2% 113 100.0%
Table 2:
Descriptive
Statistic Std. Error
jobin
c
Mean 994.8868 58.00976
95%
Confidence
Interval for
Mean
Lower Bound 879.8642
Upper Bound
1109.9094
5% Trimmed Mean 932.7778
Median 800.0000
Variance 356703.968
Std. Deviation 597.24699
Minimum 300.00
Maximum 4000.00
Range 3700.00
Interquartile Range 600.00
Skewness 2.085 .235
Kurtosis 6.175 .465
Table 3:
Group Statistics
class N Mean Std. Deviation Std. Error Mean
jobinc credit granted 48 942.5625 343.46748 49.57526
credit not granted 48 756.0417 348.44243 50.29333
Table 4
Independent Samples Test
10

Levene's Test for
Equality of
Variances t-test for Equality of Means
F Sig. t df
Sig. (2-
tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower Upper
jobin
c
Equal
variances
assumed
.013 .908 2.641 94 .010 186.52083 70.61958 46.3039
9 326.73767
Equal
variances not
assumed
2.641 93.981 .010 186.52083 70.61958 46.3036
1 326.73805
Table 7
Coefficientsa
Model
Unstandardized
Coefficients
Standardize
d
Coefficients
t Sig.
Collinearity
Statistics
B Std. Error Beta
Toleranc
e VIF
1 (Constant) 82.843 109.060 .760 .450
class 38.810 35.257 .055 1.101 .274 .905 1.105
sex 121.655 51.562 .124 2.359 .021 .802 1.247
age 2.757 2.280 .086 1.209 .230 .444 2.251
jobyrs 1.353 4.186 .022 .323 .747 .491 2.038
educ 39.380 9.916 .274 3.971 .000 .470 2.130
mstatus 35.580 36.906 .050 .964 .338 .835 1.198
Managmen
t 666.933 91.934 .489 7.255 .000 .492 2.032
Supervisor 306.479 48.943 .404 6.262 .000 .537 1.863
a. Dependent Variable: jobinc
Table 10
ANOVA
jobinc
Sum of Squares df Mean Square F Sig.
11

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Between Groups 8905004.502 2 4452502.251 130.177 .000
Within Groups 3180911.738 93 34203.352
Total 12085916.240 95
Output scheffe, Tukey and Gabriel:
jobinc
Levene Statistic df1 df2 Sig.
5.225 2 93 .007
jobinc
jobstat N
Subset for alpha = 0.05
1 2 3
TukeyBa,b other 58 623.6379
supervisory 31 1101.1290
management 7 1603.8571
Scheffea,b other 58 623.6379
supervisory 31 1101.1290
management 7 1603.8571
Sig. 1.000 1.000 1.000
Gabriela,b other 58 623.6379
supervisory 31 1101.1290
management 7 1603.8571
Sig. 1.000 1.000 1.000
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 15.596.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error
levels are not guaranteed.
Table 11:
Variables Entered/Removeda
Model Variables Entered Variables Removed Method
1 Class
Sex
Age
Job years
Education
M-statusManagement,
Supervisorsb
Job years
M-Status
Class
d.
Enter
12

b. All requested variables entered.
Table 12:
Before removing any independent variables Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
Durbin-
Watson
1 .898a .806 .788 164.29653 2.297
a. Predictors: (Constant), Supervisor, mstatus, jobyrs, class, Managment, sex, educ, age
b. Dependent Variable: jobinc
After removing independent variables.Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .895a .800 .789 163.71964
a. Predictors: (Constant), Supervisor, age, Managment, sex, educ
b. Dependent Variable: jobinc
Table 13:
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 9737494.949 8 1217186.869 45.092 .000b
Residual 2348421.290 87 26993.348
Total 12085916.240 95
a. Dependent Variable: jobinc
b. Predictors: (Constant), Supervisor, mstatus, jobyrs, class, Management, sex, educ, age
Ho = all regression coefficients are zero
H1 =not all regression efficiency are zero.
Table 14:
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.
Collinearity Statistics
B Std. Error Beta Tolerance VIF
1 (Constant) 92.321 102.534 .900 .370
sex 135.659 49.735 .139 2.728 .008 .856 1.168
age 3.827 1.588 .119 2.410 .018 .909 1.100
13

educ 37.686 9.817 .262 3.839 .000 .476 2.102
Managment 692.527 87.565 .507 7.909 .000 .539 1.856
Supervisor 321.623 47.508 .424 6.770 .000 .566 1.767
a. Dependent Variable: jobinc
APPENDIX 1
Extreme Values
Case Number Value
jobinc Highest 1 26 4000.00
2 57 3000.00
3 10 2500.00
4 76 2225.00
5 63 2200.00
Lowest 1 93 300.00
2 73 350.00
3 100 376.00
4 90 400.00
5 86 400.00a
a. Only a partial list of cases with the value 400.00 are
shown in the table of lower extremes.
APPENDIX 2
Descriptives
Statistic Std. Error
jobinc Mean 849.3021 36.40344
95% Confidence Interval
for Mean
Lower Bound 777.0321
Upper Bound 921.5720
5% Trimmed Mean 826.1574
Median 752.5000
Variance 127220.171
Std. Deviation 356.67937
Minimum 300.00
Maximum 1916.00
Range 1616.00
Interquartile Range 400.00
14

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Skewness .956 .246
Kurtosis
.567 .488
Appendix 3
Multiple Comparisons
Dependent Variable: jobinc
(I) jobstat (J) jobstat
Mean
Difference (I-
J) Std. Error Sig.
95% Confidence Interval
Lower Bound Upper Bound
Scheffe management supervisory 502.72811* 77.39203 .000 310.1996 695.2566
other 980.21921* 73.99937 .000 796.1306 1164.3078
supervisory management -502.72811* 77.39203 .000 -695.2566 -310.1996
other 477.49110* 41.14665 .000 375.1304 579.8518
other management -980.21921* 73.99937 .000 -1164.3078 -796.1306
supervisory -477.49110* 41.14665 .000 -579.8518 -375.1304
Gabriel management supervisory 502.72811* 77.39203 .000 325.5513 679.9049
other 980.21921* 73.99937 .000 818.3902 1142.0483
supervisory management -502.72811* 77.39203 .000 -679.9049 -325.5513
other 477.49110* 41.14665 .000 378.6939 576.2883
other management -980.21921* 73.99937 .000 -1142.0483 -818.3902
supervisory -477.49110* 41.14665 .000 -576.2883 -378.6939
Tamhane management supervisory 502.72811* 111.01925 .006 166.2723 839.1839
other 980.21921* 106.02064 .000 641.6028 1318.8357
supervisory management -502.72811* 111.01925 .006 -839.1839 -166.2723
other 477.49110* 43.53750 .000 369.6764 585.3058
other management -980.21921* 106.02064 .000 -1318.8357 -641.6028
supervisory -477.49110* 43.53750 .000 -585.3058 -369.6764
*. The mean difference is significant at the 0.05 level.
15

Appendix 4
Appendix 5
CasewiseDiagnosticsa
Case Number Std. Residual jobinc Predicted Value Residual
1 -2.088 1200.00 1543.1281 -343.12815
3 -2.093 700.00 1043.9316 -343.93160
6 3.057 1520.00 1017.6726 502.32745
a. Dependent Variable: jobinc
16

Appendix 6
Residuals Statisticsa
Minimum Maximum Mean Std. Deviation N
Predicted Value 375.6169 1757.7147 849.3021 320.15613 96
Std. Predicted Value -1.480 2.837 .000 1.000 96
Standard Error of Predicted
Value 34.236 92.103 48.888 11.918 96
Adjusted Predicted Value 375.5707 1710.2943 850.4070 322.18529 96
Residual -343.93161 502.32745 .00000 157.22666 96
Std. Residual -2.093 3.057 .000 .957 96
Stud. Residual -2.333 3.371 -.003 1.018 96
Deleted Residual -462.82129 610.75305 -1.10487 178.62678 96
Stud. Deleted Residual -2.395 3.595 -.002 1.034 96
Mahal. Distance 3.136 28.865 7.917 4.813 96
Cook's Distance .000 .277 .016 .042 96
Centered Leverage Value .033 .304 .083 .051 96
a. Dependent Variable: jobinc
17