logo

Regression and Correlation Analysis

   

Added on  2022-12-01

6 Pages1231 Words456 Views
Regression and Correlation Analysis 1
Regression and Correlation Analysis
Name
Course Number
Date
Faculty Name
Regression and Correlation Analysis_1
Regression and Correlation Analysis 2
Regression and Correlation Analysis
1. Scatterplot of the dependent variable Y and the independent variable X1
Figure 1: A scatterplot of starts and applications
The scatterplot in figure 1 shows a relationship between the number of potential students who
applied at DeVry and the number of students who started classes. The red line is a best of fit line
estimated using least squares algorithm. From what we can observe, there is a positive
correlation between the number of applications and those who started classes.
2. The equation of best fit line explaining the relationship between the in dependent variable
and the selected independent variable
Y =0.4035 X 1+12.2432
This equation indicates that the dependent variables (Starts) and the independent variable
(number of applications) have a positive correlation as depicted in the scatterplot in figure 1.
3. Coefficient of correlation
The coefficient of correlation between number of students who started and applications is
approximately 0.87. This shows that there is a very strong positive correlation between these two
variables. Therefore, it could be interpreted that as the number of applications increases, so does
the number of students who start in the class.
4. Coefficient of determination
The coefficient of determination is determined from the regression output. The coefficient of
determination for the linear regression between number of starts and applications is
approximately 0.75. This indicates that 75% of the variation observed on the number of students
who start at DeVry is explained by the number of applications obtained (Frost, 2013; Krzywinski
and Altman, 2015).
Regression and Correlation Analysis_2
Regression and Correlation Analysis 3
5. Utility of the regression model
Using the ANOVA test, the p-value associated with the F-statistic is very small which means that
the model is statistically different from a null model. Therefore, this model is significant based
on this statistic. Further, the coefficient of the independent variable in the model is significantly
different from zero (p-value < 0.001).
Table 1: Model output
ANOVA
Degrees of
freedom SS MS F
Significance
of F
Regression 1 1307.747 1307.747 309.046 < 0.001
Residual 98 414.693 4.232
Total 99 1722.44
Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Intercept 12.243 1.837 6.664 0.000 8.597 15.889
Applications (X1) 0.404 0.023 17.580 0.000 0.358 0.449
6. The ability of the independent variable to predict the dependent variable
The ability of an independent variable to predict about a dependent variable is determined by
their relationship. The nature of the relationship can be linear or non-linear and insights about the
kind of relationship is obtained visual plots between the variables. In this analysis, a scatter plot
of number of starts and number of applications was plotted and a linear relationship was
observed. Further, a correlation coefficient was obtained and it did show that there was a very
strong correlation between the two variables. A linear regression analysis was found to be
statistically significant based on the ANOVA test. In addition, the coefficient of the independent
variable in the model was found to be statistically significant, hence concluding that it can be
used to predict the number of starts for admissions at DeVry (Frost, 2013).
7. The confidence interval for β1 (the population slope) – using 95% confidence interval
The confidence interval of the β1 coefficient is 0.358 (lower limit) and 0.449 (the upper limit).
This can be written as0.358 0.404 0.449. This confidence interval indicates that the population
slope coefficient is significantly different from zero because the value 0 is not included in the
interval. Further, it could be explained that there is a 95% chance that this coefficient would be
between the two values, upper and lower limits, given that the data is drawn from the same
population and the sample is sufficient (Cumming and Fidler, 2009).
8. 99% confidence interval for the value 30 of the independent variable (number of
applications)
Y =0.4035(30)+ 12.2432
Regression and Correlation Analysis_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Analysis of R&D Expenses and Assets in Firms
|17
|2546
|373

Transport Economics - Desklib Online Library
|17
|2721
|299

Statistics for Financial Decisions | Assignment
|11
|1840
|19

Quantitative Demand Analysis - Assignment
|8
|1409
|314

Tutor-marked Exercise 4: Computer Section
|33
|801
|100

Regression Analysis in Statistics
|11
|1898
|65