Business Statistical Modeling
VerifiedAdded on  2023/04/04
|9
|2011
|445
AI Summary
This assignment focuses on business analytics, specifically business intelligence and statistical analysis. It explores datasets related to fuel providers in Australia and analyzes variables to draw conclusions. The assignment discusses the descriptive statistics of the price variable, the distribution of prices, and hypothesis testing. It also examines the differences in fuel prices offered by different service stations. The assignment concludes with a discussion on the implications for competition and the need for standardizing prices and improving service provisions.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Business 1
Business Statistical Modeling
Name of Author
Name of Class
Name of Professor
Name of School
State and City of School
Date
Business Statistical Modeling
Name of Author
Name of Class
Name of Professor
Name of School
State and City of School
Date
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Business 2
Introduction
The aim of this assignment is to give participants the skills that are relevant for data collection
and the analysis of the respective datasets. Uses of data range across several disciplines. Data is needed
for research in learning institutions and for this, the social sciences have their own software for
analyzing their own data, the SPSS software (Ho, 2017). Data is also needed by institutions outside the
learning field. Different governments carry out censors from time to time in order to know how much its
population has grown and what number of people is its population. This information can then be used to
do planning on matters resource allocation later in the next years before another sensor is conducted.
Similarly, business institutions are prone to the act of data usage. Business institutions collect and
analyze datasets from different sources to aid in making informed business decisions. Decisions which
are influenced by knowing which products are purchased more than others and all of that can be gotten
via data. Business use of datasets to make informed decisions is called business analytics (Mendoza,
Gallego-Schmid & Azapagic, 2019).
Business analytics which is abbreviated as BA is the use of the frequentative study of
organizational data through methods. This usually has an emphasis on statistical analysis. The sole
purpose for the use of business analytics is to aid in making data-driven decisions. Numbers do not lie
decisions made in numbers that are drawn from data would be more informed. This would lead to
higher optimization and automation of business processes. Most corporates treat data as assets and
only use it for competitive advantage (Laudon & Traver, 2016).
The assignment is typically business analytics related assignments as it will focus on the two types of
business analytics which are Business Intelligence and the deeper statistical analysis in drawing
conclusions from the data collected that is in relation to different fuel providers in Australia (Laursen &
Thorlund, 2016). This is the more reason for those engaged to apply learned theories of finding
numerical summaries, graph visualizations and construction of respective statistical hypotheses which
are later tested to better help in the interpretation of results.
Dataset 1 is a secondary dataset that was collected from the Australian open dataset website
that contains fuel checks information for different years and 2016 was the desired year from which
dataset 1 was picked. There are six variables in total and only two types; the string and the numerical
variables. There are four variables which are actually string and these are; Address, Suburb, Brand, and
FuelCode. The remaining Postcode and Price are both numerical variables. There are 1001 cases and
the very first case is the variables names case. Each case has up to six features represented by all the six
variables that are seen in each case has all information that relates to a fuel station and the price of that
respective fuel station (Higashinaka, Funakoshi, Kobayashi & Inaba, 2016).
Dataset 2 is a secondary dataset and I collected it from Australian public data and the actual
website from where it got collected from is; https://data.gov.au/dataset/ds-nsw-a97a46fc-2bdd-4b90-
ac7f-0cb1e8d7ac3b/details. After which it was edited and it has eight variables, three of which are
numeric and five are string variables. The cases are thirty in total which have eight characteristics as per
the respective variables. The dataset is full of limitations because it is more biased as opposed to the
first dataset because it is a small sample with only thirty cases to be analyzed which would only take a
few service stations as opposed to dataset 1.
Analysis of single variable in Dataset 1
Introduction
The aim of this assignment is to give participants the skills that are relevant for data collection
and the analysis of the respective datasets. Uses of data range across several disciplines. Data is needed
for research in learning institutions and for this, the social sciences have their own software for
analyzing their own data, the SPSS software (Ho, 2017). Data is also needed by institutions outside the
learning field. Different governments carry out censors from time to time in order to know how much its
population has grown and what number of people is its population. This information can then be used to
do planning on matters resource allocation later in the next years before another sensor is conducted.
Similarly, business institutions are prone to the act of data usage. Business institutions collect and
analyze datasets from different sources to aid in making informed business decisions. Decisions which
are influenced by knowing which products are purchased more than others and all of that can be gotten
via data. Business use of datasets to make informed decisions is called business analytics (Mendoza,
Gallego-Schmid & Azapagic, 2019).
Business analytics which is abbreviated as BA is the use of the frequentative study of
organizational data through methods. This usually has an emphasis on statistical analysis. The sole
purpose for the use of business analytics is to aid in making data-driven decisions. Numbers do not lie
decisions made in numbers that are drawn from data would be more informed. This would lead to
higher optimization and automation of business processes. Most corporates treat data as assets and
only use it for competitive advantage (Laudon & Traver, 2016).
The assignment is typically business analytics related assignments as it will focus on the two types of
business analytics which are Business Intelligence and the deeper statistical analysis in drawing
conclusions from the data collected that is in relation to different fuel providers in Australia (Laursen &
Thorlund, 2016). This is the more reason for those engaged to apply learned theories of finding
numerical summaries, graph visualizations and construction of respective statistical hypotheses which
are later tested to better help in the interpretation of results.
Dataset 1 is a secondary dataset that was collected from the Australian open dataset website
that contains fuel checks information for different years and 2016 was the desired year from which
dataset 1 was picked. There are six variables in total and only two types; the string and the numerical
variables. There are four variables which are actually string and these are; Address, Suburb, Brand, and
FuelCode. The remaining Postcode and Price are both numerical variables. There are 1001 cases and
the very first case is the variables names case. Each case has up to six features represented by all the six
variables that are seen in each case has all information that relates to a fuel station and the price of that
respective fuel station (Higashinaka, Funakoshi, Kobayashi & Inaba, 2016).
Dataset 2 is a secondary dataset and I collected it from Australian public data and the actual
website from where it got collected from is; https://data.gov.au/dataset/ds-nsw-a97a46fc-2bdd-4b90-
ac7f-0cb1e8d7ac3b/details. After which it was edited and it has eight variables, three of which are
numeric and five are string variables. The cases are thirty in total which have eight characteristics as per
the respective variables. The dataset is full of limitations because it is more biased as opposed to the
first dataset because it is a small sample with only thirty cases to be analyzed which would only take a
few service stations as opposed to dataset 1.
Analysis of single variable in Dataset 1
Business 3
This section brings us to the analytical part of the price variable of dataset 1. For one to
understand how a dataset it numerically then a descriptive statistics must be run on the respective
dataset or data variable that is to be analyzed. The purpose of descriptive statistics is to determine the
numerical constants that aid in the quantitative and qualitative analysis of the datasets (Larson-Hall,
2015). There also needs to know how many data points are there and how many times does a data point
occur in the dataset. The descriptive statistics figure is as shown in table 1;
Figure 1
The mean and the median are a bit close showing that the data points are a bit deviated but not so far
from the centre. This is also supported by the small value that is there for standard deviation, a true
indication that data points are not extensively deviated from the centre. The data points though, as can
be seen from sample variance that stands at 13.48, mostly fall far apart from each other. The price that
most petrol service stations offer fuel for is the modal price which is at 129.9 (Otto, 2016).
Further distribution of the price variable is shown by a histogram graphical representation as
bellow in figure 2;
This section brings us to the analytical part of the price variable of dataset 1. For one to
understand how a dataset it numerically then a descriptive statistics must be run on the respective
dataset or data variable that is to be analyzed. The purpose of descriptive statistics is to determine the
numerical constants that aid in the quantitative and qualitative analysis of the datasets (Larson-Hall,
2015). There also needs to know how many data points are there and how many times does a data point
occur in the dataset. The descriptive statistics figure is as shown in table 1;
Figure 1
The mean and the median are a bit close showing that the data points are a bit deviated but not so far
from the centre. This is also supported by the small value that is there for standard deviation, a true
indication that data points are not extensively deviated from the centre. The data points though, as can
be seen from sample variance that stands at 13.48, mostly fall far apart from each other. The price that
most petrol service stations offer fuel for is the modal price which is at 129.9 (Otto, 2016).
Further distribution of the price variable is shown by a histogram graphical representation as
bellow in figure 2;
Business 4
Figure 2
From the histogram the prices were put into bin ranges via pivot tables and of the results from the
histogram, it is evident that the highest of prices fall between the bin ranges 117.9-127.9. Other lesser
but still higher values cluster around this range to the right and to the left as their magnitudes reduce
gradually. The dataset is skewed more to the right (Chaamwe & Shumba, 2016).
From a normal check, it is obvious that the average of the fuel prices is way more than 115 and actually
stands at 122. The setting of hypothesis though will enable the proper answering of the research
question. The hypotheses are;
Null hypothesis: Average price of petrol is more than 115
Alternative hypothesis: Average price of petrol is equal to 115
Conducting a z-test in order to test hypothesis gives us the values in table 3 below. From the table, it is
evident to see that the z critical two-tailed is at 1.96 which will form a positive and a negative lower
bound on the normal distribution curve to determine the actual rejection region and the non-rejection
region. Looking at the p-value which is at 0 does not fall into the rejection region and therefore we fail
to reject the null hypothesis hence the price mean is more than 115 (Kim, 2017).
Figure 2
From the histogram the prices were put into bin ranges via pivot tables and of the results from the
histogram, it is evident that the highest of prices fall between the bin ranges 117.9-127.9. Other lesser
but still higher values cluster around this range to the right and to the left as their magnitudes reduce
gradually. The dataset is skewed more to the right (Chaamwe & Shumba, 2016).
From a normal check, it is obvious that the average of the fuel prices is way more than 115 and actually
stands at 122. The setting of hypothesis though will enable the proper answering of the research
question. The hypotheses are;
Null hypothesis: Average price of petrol is more than 115
Alternative hypothesis: Average price of petrol is equal to 115
Conducting a z-test in order to test hypothesis gives us the values in table 3 below. From the table, it is
evident to see that the z critical two-tailed is at 1.96 which will form a positive and a negative lower
bound on the normal distribution curve to determine the actual rejection region and the non-rejection
region. Looking at the p-value which is at 0 does not fall into the rejection region and therefore we fail
to reject the null hypothesis hence the price mean is more than 115 (Kim, 2017).
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Business 5
Figure 3
Analysis of two variables in Dataset 1
First of all, in this section, we start by looking at the descriptive statistics when looking at the
four brands’ fuel prices. The four brands that are interestingly used by NRMA to make reports are;
Caltex, Caltex Woolworths, Coles Express and 7-Eleven. The descriptive statistics are as per the table
below:
Figure 4
From figure 4 there are clear indications on how the price means of each service station differ from each
other and their respective values. The variances and the mode values are in the table as well. The focus
will be on what the mean, variance, median and standard deviations illustrate. A larger difference in the
means and medians would illustrate that the data points that represent the prices of each service
station extremely deviate from the middle point of an entire dataset, a fact that is represented by a
larger value of the standard deviation as well. The variance, on the other hand, gives us a degree at
which the data points have deviated from one another. A larger value variance would illustrate a large
difference between two data points (Khany & Tazik, 2019).
Figure 3
Analysis of two variables in Dataset 1
First of all, in this section, we start by looking at the descriptive statistics when looking at the
four brands’ fuel prices. The four brands that are interestingly used by NRMA to make reports are;
Caltex, Caltex Woolworths, Coles Express and 7-Eleven. The descriptive statistics are as per the table
below:
Figure 4
From figure 4 there are clear indications on how the price means of each service station differ from each
other and their respective values. The variances and the mode values are in the table as well. The focus
will be on what the mean, variance, median and standard deviations illustrate. A larger difference in the
means and medians would illustrate that the data points that represent the prices of each service
station extremely deviate from the middle point of an entire dataset, a fact that is represented by a
larger value of the standard deviation as well. The variance, on the other hand, gives us a degree at
which the data points have deviated from one another. A larger value variance would illustrate a large
difference between two data points (Khany & Tazik, 2019).
Business 6
The following bit is to test if there is a significant difference in the means of the prices offered by the
four service stations. For this hypothesis will be developed and this, therefore, will be to test if there is a
significant difference between the prices offered for fuel by different service providers.
Null Hypothesis: There is a difference between the prices offered for fuel prices by different service
stations.
From the actual hypothesis test to see if there is a difference between the means or if there is no
difference between the means, we will conduct a paired sample t-test in excel and our hypothesized
mean will be 0. One of the results as per the test is;
Table 4
From table 4 we are able to see that from the p-value for the two-tailed test is greater than 0.05 which is
the significant value for the two tables in figure 4. Form the p-value, we then fail to reject the null
hypothesis and what this therefore means, is that there is a significant difference between the prices
offered by the service stations for all the service stations (Halsey, Curran-Everett, Vowler & Drummond,
2015).
From the findings in the paragraphs of this section, it is clear that different service providers offer
different prices for fuel with others offering it at a higher price than others. There is no constant price
and this shows that there is no understanding as to what price the providers of fuel should offer prices
at. From the four service stations that were of interest, it is clear that the lowest service provider in price
is Caltex Woolworths.
Collect and analysis Dataset 2
The dataset that has been collected is of the numerical values as in the below table;
The following bit is to test if there is a significant difference in the means of the prices offered by the
four service stations. For this hypothesis will be developed and this, therefore, will be to test if there is a
significant difference between the prices offered for fuel by different service providers.
Null Hypothesis: There is a difference between the prices offered for fuel prices by different service
stations.
From the actual hypothesis test to see if there is a difference between the means or if there is no
difference between the means, we will conduct a paired sample t-test in excel and our hypothesized
mean will be 0. One of the results as per the test is;
Table 4
From table 4 we are able to see that from the p-value for the two-tailed test is greater than 0.05 which is
the significant value for the two tables in figure 4. Form the p-value, we then fail to reject the null
hypothesis and what this therefore means, is that there is a significant difference between the prices
offered by the service stations for all the service stations (Halsey, Curran-Everett, Vowler & Drummond,
2015).
From the findings in the paragraphs of this section, it is clear that different service providers offer
different prices for fuel with others offering it at a higher price than others. There is no constant price
and this shows that there is no understanding as to what price the providers of fuel should offer prices
at. From the four service stations that were of interest, it is clear that the lowest service provider in price
is Caltex Woolworths.
Collect and analysis Dataset 2
The dataset that has been collected is of the numerical values as in the below table;
Business 7
Table 5
The histogram will be something like;
Table 6
Most students used in the sample prefer purchasing their fuel from Coles Express.
Table 5
The histogram will be something like;
Table 6
Most students used in the sample prefer purchasing their fuel from Coles Express.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Business 8
Discussion and conclusion
It is evident that of the four fuel service providers, Caltex brand is the most loved with more
customers getting their fuel from Caltex brand. In-spite of Caltex Woolworths being the cheapest brand
of the four, it still falls in the third place. In order to avoid this unhealthy relationship in competition
NRMA should check on the quality of service that Caltex Woolworths does not offer that Caltex does
offer and advice Caltex and the other two to offer so as to maintain the best competition rates in the
market. The fuel prices should also be standardized to avoid over charges in specific fuel providers
stations.
Further research should therefore be conducted in a bid to help address challenges of fuel prices
and actual service offered at different service stations. This is done in order to help standardize prices
and improve on service provisions so as to make customers comfortable with going to any service
provider (Gandal & Halaburda, 2016).
Discussion and conclusion
It is evident that of the four fuel service providers, Caltex brand is the most loved with more
customers getting their fuel from Caltex brand. In-spite of Caltex Woolworths being the cheapest brand
of the four, it still falls in the third place. In order to avoid this unhealthy relationship in competition
NRMA should check on the quality of service that Caltex Woolworths does not offer that Caltex does
offer and advice Caltex and the other two to offer so as to maintain the best competition rates in the
market. The fuel prices should also be standardized to avoid over charges in specific fuel providers
stations.
Further research should therefore be conducted in a bid to help address challenges of fuel prices
and actual service offered at different service stations. This is done in order to help standardize prices
and improve on service provisions so as to make customers comfortable with going to any service
provider (Gandal & Halaburda, 2016).
Business 9
References
Chaamwe, N., & Shumba, L. (2016). ICT integrated learning: using spreadsheets as tools for e-learning, a
case of statistics in Microsoft excel. International Journal of Information and Education Technology, 6(6),
435-440.
Gandal, N., & Halaburda, H. (2016). Can we predict the winner in a market with network effects?
Competition in cryptocurrency market. Games, 7(3), 16.
Halsey, L. G., Curran-Everett, D., Vowler, S. L., & Drummond, G. B. (2015). The fickle P value generates
irreproducible results. Nature methods, 12(3), 179.
Higashinaka, R., Funakoshi, K., Kobayashi, Y., & Inaba, M. (2016). The dialogue breakdown detection
challenge: Task description, datasets, and evaluation metrics. In LREC.
Ho, R. (2017). Understanding statistics for the social sciences with IBM SPSS. Chapman and Hall/CRC.
Khany, R., & Tazik, K. (2019). Levels of Statistical Use in Applied Linguistics Research Articles: From 1986
to 2015. Journal of Quantitative Linguistics, 26(1), 48-65.
Kim, H. Y. (2017). Statistical notes for clinical researchers: chi-squared test and Fisher's exact test.
Restorative dentistry & endodontics, 42(2), 152-155.
Laudon, K. C., & Traver, C. G. (2016). E-commerce: business, technology, society.
Laursen, G. H., & Thorlund, J. (2016). Business analytics for managers: Taking business intelligence
beyond reporting. John Wiley & Sons.
Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and R.
Routledge.
Mendoza, J. M. F., Gallego-Schmid, A., & Azapagic, A. (2019). Building a business case for
implementation of a circular economy in higher education institutions. Journal of Cleaner Production,
220, 553-567.
Otto, M. (2016). Chemometrics: statistics and computer application in analytical chemistry. John Wiley &
Sons.
References
Chaamwe, N., & Shumba, L. (2016). ICT integrated learning: using spreadsheets as tools for e-learning, a
case of statistics in Microsoft excel. International Journal of Information and Education Technology, 6(6),
435-440.
Gandal, N., & Halaburda, H. (2016). Can we predict the winner in a market with network effects?
Competition in cryptocurrency market. Games, 7(3), 16.
Halsey, L. G., Curran-Everett, D., Vowler, S. L., & Drummond, G. B. (2015). The fickle P value generates
irreproducible results. Nature methods, 12(3), 179.
Higashinaka, R., Funakoshi, K., Kobayashi, Y., & Inaba, M. (2016). The dialogue breakdown detection
challenge: Task description, datasets, and evaluation metrics. In LREC.
Ho, R. (2017). Understanding statistics for the social sciences with IBM SPSS. Chapman and Hall/CRC.
Khany, R., & Tazik, K. (2019). Levels of Statistical Use in Applied Linguistics Research Articles: From 1986
to 2015. Journal of Quantitative Linguistics, 26(1), 48-65.
Kim, H. Y. (2017). Statistical notes for clinical researchers: chi-squared test and Fisher's exact test.
Restorative dentistry & endodontics, 42(2), 152-155.
Laudon, K. C., & Traver, C. G. (2016). E-commerce: business, technology, society.
Laursen, G. H., & Thorlund, J. (2016). Business analytics for managers: Taking business intelligence
beyond reporting. John Wiley & Sons.
Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and R.
Routledge.
Mendoza, J. M. F., Gallego-Schmid, A., & Azapagic, A. (2019). Building a business case for
implementation of a circular economy in higher education institutions. Journal of Cleaner Production,
220, 553-567.
Otto, M. (2016). Chemometrics: statistics and computer application in analytical chemistry. John Wiley &
Sons.
1 out of 9
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
 +13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024  |  Zucol Services PVT LTD  |  All rights reserved.