Understanding Regression Analysis
VerifiedAdded on 2023/04/05
|9
|1477
|383
AI Summary
This paper aims to conduct a regression analysis using a dependent variable and at least two independent variables. The focus will be on linear regression.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Running head: Understanding Regression Analysis
1
Name of Student
Student Id
Course Title
Institution Affiliation
Submission Date
1
Name of Student
Student Id
Course Title
Institution Affiliation
Submission Date
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Understanding Regression Analysis
2
Table of content
Introduction......................................................................................................................................3
Formulation of assumptions (hypothesis)....................................................................................4
The data set......................................................................................................................................5
Data Analysis...................................................................................................................................5
R-squared.....................................................................................................................................6
Alpha value (α)............................................................................................................................7
Coefficients..................................................................................................................................7
T statistics....................................................................................................................................7
Conclusion.......................................................................................................................................8
2
Table of content
Introduction......................................................................................................................................3
Formulation of assumptions (hypothesis)....................................................................................4
The data set......................................................................................................................................5
Data Analysis...................................................................................................................................5
R-squared.....................................................................................................................................6
Alpha value (α)............................................................................................................................7
Coefficients..................................................................................................................................7
T statistics....................................................................................................................................7
Conclusion.......................................................................................................................................8
Understanding Regression Analysis
3
Introduction
This paper aims to conduct a regression analysis using a dependent variable and at least two
independent variables. A company or a market with a data set need to be selected to carry out the
analyses. Regression analysis is one of the machine learning techniques under supervised
learning that is used for prediction. It is mostly used by entrepreneur and government to predict
the economy, population and so many other activities. Regression analysis is categorized into
seven types; linear regression, logistic regression, polynomial linear regression, stepwise
regression, ridge regression, Lasso regression, and Elastic Net regression. In this paper, the focus
will be on the linear regression. Linear regression is categorized into two types: simple liner
regression and multiple linear regression analysis. Linear regression occurs in a situation where
there is one independent variable while multiple regression analysis occurs when there is more
than one independent variable (Montgomery, Peck & Vining, 2012). The data analysis technique
required in this paper is a multiple regression analysis. The general equation of linear multiple
regression models are given by:
Y=c+b1*X + b2 * X2 + e
The letter Y represents the dependent variable, c is the intercept, X1 and X2 represent the
independent variables while b1 and b2 are the coefficients of the independent variables
concerning the independent variables.
While conducting the regression, they are some assumptions that should be made. The regression
output will be used to test the assumption made. Nonetheless, the output obtained will be used to
decide the level of effectiveness of the model and the strength of the model.
3
Introduction
This paper aims to conduct a regression analysis using a dependent variable and at least two
independent variables. A company or a market with a data set need to be selected to carry out the
analyses. Regression analysis is one of the machine learning techniques under supervised
learning that is used for prediction. It is mostly used by entrepreneur and government to predict
the economy, population and so many other activities. Regression analysis is categorized into
seven types; linear regression, logistic regression, polynomial linear regression, stepwise
regression, ridge regression, Lasso regression, and Elastic Net regression. In this paper, the focus
will be on the linear regression. Linear regression is categorized into two types: simple liner
regression and multiple linear regression analysis. Linear regression occurs in a situation where
there is one independent variable while multiple regression analysis occurs when there is more
than one independent variable (Montgomery, Peck & Vining, 2012). The data analysis technique
required in this paper is a multiple regression analysis. The general equation of linear multiple
regression models are given by:
Y=c+b1*X + b2 * X2 + e
The letter Y represents the dependent variable, c is the intercept, X1 and X2 represent the
independent variables while b1 and b2 are the coefficients of the independent variables
concerning the independent variables.
While conducting the regression, they are some assumptions that should be made. The regression
output will be used to test the assumption made. Nonetheless, the output obtained will be used to
decide the level of effectiveness of the model and the strength of the model.
Understanding Regression Analysis
4
Formulation of assumptions (hypothesis)
Two hypotheses are formulated for the study, i.e. the null hypothesis and alternate hypothesis
(Tomasello et al. 2012)
H0 – There is no relation between the Sales (dependent) and independent variables (Profit and
Shipping cost)
H1 - There is a relationship between the Sales (dependent) and independent variables (Profit and
Shipping cost)
Using multiple regression analysis, we will predict the sales of a company using Profit and the
discount as the independent variables. The main aim of a company is to make sales, and this is
essential for its growth. So many companies analyses their data to make sure they are relevant to
the market. They incur a lot of money during productions and delivery of their goods, and for
this purpose, they feel concerned about how to reduce their production while increasing their
costs. In this paper, the relationship between Sales, Profit and Shipping cost will be determined.
The data set
The data set that is used for this research is a company data which displays the company’s Sales
of four different products, the profit of the four different products, the quantity of each product
sold, discount offered for the products sold and the shipping of all the four products. The data
contains 30 observations and seven columns (Order Id, Products, and Sales, Quantity, Discount,
Profit and Shipping cost). The company that the data has been extracted form wanted to forecast
the Sales of its goods using the independent variables such as the profit, the discount, and the
4
Formulation of assumptions (hypothesis)
Two hypotheses are formulated for the study, i.e. the null hypothesis and alternate hypothesis
(Tomasello et al. 2012)
H0 – There is no relation between the Sales (dependent) and independent variables (Profit and
Shipping cost)
H1 - There is a relationship between the Sales (dependent) and independent variables (Profit and
Shipping cost)
Using multiple regression analysis, we will predict the sales of a company using Profit and the
discount as the independent variables. The main aim of a company is to make sales, and this is
essential for its growth. So many companies analyses their data to make sure they are relevant to
the market. They incur a lot of money during productions and delivery of their goods, and for
this purpose, they feel concerned about how to reduce their production while increasing their
costs. In this paper, the relationship between Sales, Profit and Shipping cost will be determined.
The data set
The data set that is used for this research is a company data which displays the company’s Sales
of four different products, the profit of the four different products, the quantity of each product
sold, discount offered for the products sold and the shipping of all the four products. The data
contains 30 observations and seven columns (Order Id, Products, and Sales, Quantity, Discount,
Profit and Shipping cost). The company that the data has been extracted form wanted to forecast
the Sales of its goods using the independent variables such as the profit, the discount, and the
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Understanding Regression Analysis
5
shipping cost. The data set has been obtained from the GitHub website
(https://github.com/gowthamharsha/sasproject1).
Data Analysis
The dependent variable that is used for this multiple regression analysis is the sales while the
independent variables used for the multiple regression analysis are the profit and the discount.
Before the analysis is conducted, a correlation analysis is conducted to determine whether the
variables provided has any association. Correlation analysis is one of the techniques used to
determine the association of variables. Below is the result of the correlation analysis:
Table 1: Correlation Analysis
Sales
Discoun
t Profit
Sales 1
Discoun
t
-
0.09791 1
Profit
0.89521
8
-
0.13694 1
Table 1 show that the relationship between Sales and the profit is 0.895128 and this means that
there is a positive relationship between these two variables i.e. 89.5 %. The relationship between
Sales and Discount is -0.09791 which means that Sales and Discount have a negative
relationship of 9.8 %. The results obtained shows that the two independent variables have an
association with the dependent variable. The next step is to conduct a multiple regression
analysis. Below is the multiple regression analysis.
Table 2: Regression Analysis
5
shipping cost. The data set has been obtained from the GitHub website
(https://github.com/gowthamharsha/sasproject1).
Data Analysis
The dependent variable that is used for this multiple regression analysis is the sales while the
independent variables used for the multiple regression analysis are the profit and the discount.
Before the analysis is conducted, a correlation analysis is conducted to determine whether the
variables provided has any association. Correlation analysis is one of the techniques used to
determine the association of variables. Below is the result of the correlation analysis:
Table 1: Correlation Analysis
Sales
Discoun
t Profit
Sales 1
Discoun
t
-
0.09791 1
Profit
0.89521
8
-
0.13694 1
Table 1 show that the relationship between Sales and the profit is 0.895128 and this means that
there is a positive relationship between these two variables i.e. 89.5 %. The relationship between
Sales and Discount is -0.09791 which means that Sales and Discount have a negative
relationship of 9.8 %. The results obtained shows that the two independent variables have an
association with the dependent variable. The next step is to conduct a multiple regression
analysis. Below is the multiple regression analysis.
Table 2: Regression Analysis
Understanding Regression Analysis
6
Using these outputs, we are going to interpret R2, alpha, coefficients, and T-statistics.
R-squared
R-squared is used to determine the effect of the independent variables to the variation in the
dependent variable. It also determines how the model defines the data. When its value
approaches zero, then one can conclude that the independent variable explains a lower variation
of the dependent and when the value approaches one then one can conclude that the independent
variable explains a stronger variation of the independent variable. The value of R-squared
obtained after the analysis is 0.8020 and this means that 80.2 % of discount and profit
determined the variation of the sales of the four products. Therefore, it can be concluded that the
model is strong.
The alpha level (α)
The alpha value is used to test the level of significance of the independent variables. It is also
used to test the hypothesis. The P-values together with the alpha level tests the significance of
6
Using these outputs, we are going to interpret R2, alpha, coefficients, and T-statistics.
R-squared
R-squared is used to determine the effect of the independent variables to the variation in the
dependent variable. It also determines how the model defines the data. When its value
approaches zero, then one can conclude that the independent variable explains a lower variation
of the dependent and when the value approaches one then one can conclude that the independent
variable explains a stronger variation of the independent variable. The value of R-squared
obtained after the analysis is 0.8020 and this means that 80.2 % of discount and profit
determined the variation of the sales of the four products. Therefore, it can be concluded that the
model is strong.
The alpha level (α)
The alpha value is used to test the level of significance of the independent variables. It is also
used to test the hypothesis. The P-values together with the alpha level tests the significance of
Understanding Regression Analysis
7
the independent variables while Significance F and alpha level test the hypotheses (Polanczyk et
al. 2014). The default alpha level used to conduct the multiple regression is (α = 0.05). From the
regression table, one can conclude that the discount is not significant because its p-value is
greater than the alpha level. Profit is significance because its p-value is less than alpha level. To
test the hypothesis, we will use the Significance F-value, and if it’s less than the alpha level, then
the alternate hypothesis is allowed; otherwise the null hypothesis is allowed. In this case, its
value is less than the alpha level, and therefore, the alternate hypothesis is allowed (Javanmard &
Montanari, 2014). Hence, it can be concluded that the dependent profit and discount has a
relationship with the Sales obtained from the goods.
Coefficients
The regression analysis coefficients obtained from the output can be summarized as;
Sales = 58.54 +102.74 * (Discount) + 1.27 * (Profit)
This means that sales of $58.54 are not affected by both the discount and the profit. Every 1 unit
of discount increases the Sales by $102.74, and every 1 unit of profit increases the Sales by
$1.27. It can be concluded that both the increase in discount and the profit increases the sales of
the goods (Nathans, Oswald & Nimon, 2012).
T statistics
T statistics is obtained when the coefficient is divided by the standard error. If the t statistics is
greater than 2 or less than -2, then the coefficient of the independent variable is significant with a
confidence greater than 95 %. In our case profit has a t-statistic value of 10.39 and therefore it is
significant (Polit & Lake, 2010). Discount is not significant because it has a t- statistics value less
than 2.
7
the independent variables while Significance F and alpha level test the hypotheses (Polanczyk et
al. 2014). The default alpha level used to conduct the multiple regression is (α = 0.05). From the
regression table, one can conclude that the discount is not significant because its p-value is
greater than the alpha level. Profit is significance because its p-value is less than alpha level. To
test the hypothesis, we will use the Significance F-value, and if it’s less than the alpha level, then
the alternate hypothesis is allowed; otherwise the null hypothesis is allowed. In this case, its
value is less than the alpha level, and therefore, the alternate hypothesis is allowed (Javanmard &
Montanari, 2014). Hence, it can be concluded that the dependent profit and discount has a
relationship with the Sales obtained from the goods.
Coefficients
The regression analysis coefficients obtained from the output can be summarized as;
Sales = 58.54 +102.74 * (Discount) + 1.27 * (Profit)
This means that sales of $58.54 are not affected by both the discount and the profit. Every 1 unit
of discount increases the Sales by $102.74, and every 1 unit of profit increases the Sales by
$1.27. It can be concluded that both the increase in discount and the profit increases the sales of
the goods (Nathans, Oswald & Nimon, 2012).
T statistics
T statistics is obtained when the coefficient is divided by the standard error. If the t statistics is
greater than 2 or less than -2, then the coefficient of the independent variable is significant with a
confidence greater than 95 %. In our case profit has a t-statistic value of 10.39 and therefore it is
significant (Polit & Lake, 2010). Discount is not significant because it has a t- statistics value less
than 2.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Understanding Regression Analysis
8
Conclusion
The regression output obtained can be used to make some decision in the company. For the
company to make more sales, it needs to make more profit. Even though the discount increases
the sales as obtained from the regression analysis, it also has a negative association with the sales
as obtained from the correlation analysis. Therefore, the best way for the company to make more
sales is to look for better ways to increase their profits. This will automatically increase the
company’s sales.
8
Conclusion
The regression output obtained can be used to make some decision in the company. For the
company to make more sales, it needs to make more profit. Even though the discount increases
the sales as obtained from the regression analysis, it also has a negative association with the sales
as obtained from the correlation analysis. Therefore, the best way for the company to make more
sales is to look for better ways to increase their profits. This will automatically increase the
company’s sales.
Understanding Regression Analysis
9
Reference
Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional
regression. The Journal of Machine Learning Research, 15(1), 2869-2909.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (Vol.
821). John Wiley & Sons.
Nathans, L. L., Oswald, F. L., & Nimon, K. (2012). Interpreting multiple linear regression: A guidebook of
variable importance. Practical assessment, research & evaluation, 17(9).
Polanczyk, G. V., Willcutt, E. G., Salum, G. A., Kieling, C., & Rohde, L. A. (2014). ADHD prevalence
estimates across three decades: an updated systematic review and meta-regression
analysis. International journal of epidemiology, 43(2), 434-442.
Polit, D. F., & Lake, E. (2010). Statistics and data analysis for nursing research (Vol. 1). Upper Saddle
River, NJ: pearson.
Tomasello, M., Melis, A. P., Tennie, C., Wyman, E., Herrmann, E., Gilby, I. C., ... & Melis, A. (2012). Two
key steps in the evolution of human cooperation: The interdependence hypothesis. Current
anthropology, 53(6), 000-000.
9
Reference
Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional
regression. The Journal of Machine Learning Research, 15(1), 2869-2909.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (Vol.
821). John Wiley & Sons.
Nathans, L. L., Oswald, F. L., & Nimon, K. (2012). Interpreting multiple linear regression: A guidebook of
variable importance. Practical assessment, research & evaluation, 17(9).
Polanczyk, G. V., Willcutt, E. G., Salum, G. A., Kieling, C., & Rohde, L. A. (2014). ADHD prevalence
estimates across three decades: an updated systematic review and meta-regression
analysis. International journal of epidemiology, 43(2), 434-442.
Polit, D. F., & Lake, E. (2010). Statistics and data analysis for nursing research (Vol. 1). Upper Saddle
River, NJ: pearson.
Tomasello, M., Melis, A. P., Tennie, C., Wyman, E., Herrmann, E., Gilby, I. C., ... & Melis, A. (2012). Two
key steps in the evolution of human cooperation: The interdependence hypothesis. Current
anthropology, 53(6), 000-000.
1 out of 9
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.