Your contribution can guide someone’s learning journey. Share your
documents today.
Data Analytics: A Business Case Study Executive Summary Recently, eCommerce captured the attention of whole world. Online shopping is one of the main part of eCommerce. As the eCommerce business increased exponentially it brings new challenges to the service provider.Business competition and customer satisfaction are the important challenges for service provider. Service provider used the different tools, techniques and strategies to attract the customers. Business is all about the attraction, quality and service provided by the service provider. We have data of 1180 Cloths (Jacket, Jeans and Suit). We considered the following attributes / variables as Product Name, Product Price (in $), Sale Price (in $), Profit (in $), Number of customers who bought the product, Shipping Type (Free or Paid), Customer Type (New or Existing), Region (QLD, WA, VIC, TAS, SA), Product Material (Wool and Cotton) and Product Colour (Black, Blue, Pink, Red and White). We observed that company gaining about 7.95% profit overall. We can observed that there is no comparative difference in the different attributes. In the region, WA region giving the most profit percentage as 8.23% and QLD region generate 7.75% lowest among the all- region. We observed that averagely there is 11.81 customer for each products with standard deviation 3.82. We observed that only shipping type and material have significant association at 5% level of significance andcustomer type and material have significant association at 10% level of significancewhereas all other pairs are not associated. Average new customers are more than the existing customers. Mean number of customers for the products which are shipped freely is significantly more than products which has paid shipping.We can say that wool 1
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
material products are more preferred than cotton as the number of customers for wool material products are significantly more than cotton material product. We conclude that there is significant difference between mean numbers of customers in different region and there is no significant differences between mean numbers of customers according to colour. We can see that QLD has most number of customer compared to the other region. From the correlation analysis, we can say that product price and number of customer are positively related with each other. Number of customers is negatively correlated with profit and product price. Regression analysis suggest that there is significant relation between total profit and number of customers. We also observed R2as 0.74 which suggest that fitting is good.Slope of number of customers suggest that every customer gives on an average $2.3592 profit to the company. We have also given recommendation from the analysis and plan for it. 2
Table of Contents Sr. No.TopicPage No. 1List of Abbreviations and assumptions made4 2Introduction – What is the problem?5 3Research Methodology6 4Analytical Findings7 5Recommendations to the company14 6 An implementation plan based on the recommendations you have provided 14 7Conclusion15 8List of References16 9Appendix18 3
List of Abbreviations and assumptions made Max: Maximum Min: Minimum NSW: New South Wales QLD: Queensland SA: South Australia TAS: Tasmania VIC: Victoria WA: Western Australia 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Introduction – What is the problem? Recently, eCommerce captured the attention of whole world. Online shopping is one of the main part of eCommerce. As the eCommerce business increased exponentially it brings new challenges to the service provider.Business competition and customer satisfaction are the important challenges for service provider. Service provider used the different tools, techniques and strategies to attract the customers. Business is all about the attraction, quality and service provided by the service provider. About Data: We have data of 1180 Cloths (Jacket, Jeans and Suit). We considered the following attributes / variables as i)Product Name ii)Product Price (in $) iii)Sale Price (in $) iv)Profit (in $) v)Number of customers who bought the product vi)Shipping Type (Free or Paid) vii)Customer Type (New or Existing) viii)Region (QLD, WA, VIC, TAS, SA) ix)Product Material (Wool and Cotton) x)Product Colour (Black, Blue, Pink, Red and White) 5
We define following variables for our analysis from the above variables Total Monthly sale amount (in $) = Sale Price (in $) × Number of customers Total monthly profit (in $) = Profit (in $) × Number of customers Project Problem: We are interested to know the following things i)Profit analysis by shipping type, customer type, region, material and colour. ii)Whether there is any association between shipping type, customer type, region, material and colour. iii)Whetherthenumberofcustomersissignificantlydifferentshippingtype, customer type, region, material and colour. iv)Correlation analysis of variables v)Regression analysis for total monthly sales Research Methodology Data analysis is incomplete without use of statistical tools and techniques. Selection of proper tools and techniques is the important aspect of the analysis. We did the profit analysis forshipping type, customer type, region, material and colourby summarising the total sale amount and total profit. We test the association between different attributes shipping type, customer type, region, material and colourby carrying the chi-squared test for association.We used two sample t-test and one way ANOVA for testing the mean of number of customers for shipping type, customer type, region, material and colour. We carried the correlation analysis for variables product price, profit and number of customers. We used regression analysis for predicting total sale. We run the python code given in appendix and formatted output is reported. 6
Analytical Findings Profit Analysis: In Table 1, we have presented the profit analysis for shipping type, customer type, region, material and colour. We have reported the total sales amount, total profit and profit percentage for shipping type, customer type, region, material and colour. Table 1: Profit analysis according to different attributes AttributesLevelsTotal SalesTotal ProfitProfit % Shipping TypeFree143591.911427.77.96% Paid272070.321623.57.95% Customer TypeExisting194472.815667.88.06% New221189.417383.57.86% RegionNSW106339.28582.38.07% QLD45850.03552.97.75% SA62913.14927.57.83% TAS79070.66136.97.76% VIC43351.33422.17.89% WA78138.06429.68.23% MaterialCotton198617.015895.18.00% Wool217045.217156.17.90% ColourBlack83913.86687.37.97% Blue82727.46577.57.95% Pink86665.86801.67.85% Red80847.76450.17.98% White81507.46534.78.02% Total415662.133051.27.95% We observed that company gaining about 7.95% profit overall. We can observed that there is no comparative difference in the different attributes. In the region, WA region giving the most profit percentage as 8.23% and QLD region generate 7.75% lowest among the all- region. 7
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Descriptive statistics: Total sale and profit is mainly depend on the number of customers. Table 2 represent the summary statistics for shipping type, customer type, region, material and colour. In the summary statistics, we have reported size, mean, standard deviation, minimum and maximum for shipping type, customer type, region, material and colour. Table 2: Summary statistics for number of customers AttributesLevelsSizeMeanSDMinMax Shipping TypeFree34913.883.90532 Paid83110.943.44323 Customer TypeExisting54412.083.97332 New63611.583.68326 RegionNSW30811.643.73323 QLD10614.494.46432 SA17711.883.30523 TAS23611.183.88329 VIC12211.893.90323 WA23111.343.41424 MaterialCotton57410.983.81326 Wool60612.593.68332 ColourBlack23811.923.95324 Blue22612.193.87526 Pink25511.303.43319 Red23311.613.57321 White22812.074.25332 Total 118 011.813.82332 We can observed that averagely there is 11.81 customer for each products with standard deviation 3.82. We observed that average number of customers i)for free shipping is more than paid shipping. ii)for QLD region is more than other. iii)for wool material is more than cotton. iv)for blue colour is more than other. 8
Chi-square test for association: Table 3 shows the chi-square statistic and p-value for chi-square test of testing association for shipping type, customer type, region, material and colour. We have null hypothesis that there is no significant association between two attributes and alternative hypothesis is there is significant association between two attributes. We test the significant association between following pair of attributes i)shipping type and customer type. ii)shipping type and region. iii)shipping type and material. iv)shipping type and colour. v)customer type and region. vi)customer type and material. vii)customer type and colour. viii)region and material. ix)region and colour. x)material and colour. Table 3: Chi-squared test for association Pairs of attributesChi-Square StatisticP-Value shipping type and customer type.0.2480.618 shipping type and region.7.5980.180 shipping type and material.5.7370.017 shipping type and colour.4.8890.299 customer type and region.4.3640.489 customer type and material.3.3330.068 customer type and colour.3.1330.536 region and material.6.7450.240 region and colour.23.5980.260 material and colour.0.9220.911 9
We observed that only shipping type and material have significant association at 5% level of significance andcustomer type and material have significant association at 10% level of significancewhereas all other pairs are not associated. Two Sample t-test: In this section, we carried the two sample t test for testing the equality of mean of numbers of customer for shipping type (free and paid), customer type (new and existing) and material (wool and cotton). We test the following null and alternative hypothesis i)Shipping Type: Null Hypothesis: There is no significant difference between the mean of numbers of customers for free shipping and paid shipping. Alternative Hypothesis: There is significant difference between the mean of numbers of customers for free shipping and paid shipping. ii)Customer Type: Null Hypothesis: There is no significant difference between the mean of numbers of customers that are new and existing. Alternative Hypothesis: There is significant difference between the mean of numbers of customers that are new and existing.. iii)Material: Null Hypothesis: There is no significant difference between the mean of numbers of customers for wool and cotton material product. Alternative Hypothesis: There is significant difference between the mean of numbers of customers for wool and cotton material product. 10
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
In Table 4, we presented the results of two sample t test for shipping type, customer type and material. Table 4 includes test statistics, degrees of freedom and p value. Table 4: Two sample independent test for shipping type, customer type and material AttributesLevelsTest Statistic P value Shipping TypeFree and Paid12.250.000 Customer Type New and Existing 2.220.026 MaterialWool and Cotton-7.390.000 From Table 4, we can see that P-value of shipping type, customer type and material is less than 5% suggest that there is significant difference between the mean number of customers for this attributes.Average new customers are more than the existing customers. Mean number of customers for the products which are shipped freely is significantly more than products which has paid shipping.We can say that wool material products are more preferred than cotton as the number of customers for wool material products are significantly more than cotton material product. One way ANOVA: We test whether there is any significant difference between means number of customers for different level of i)Region (QLD, WA, VIC, TAS, SA) ii)Product Colour (Black, Blue, Pink, Red and White) 11
We test the following null and alternative hypothesis i)Region (QLD, WA, VIC, TAS and SA) Null Hypothesis: There is no significant difference between mean numbers of customer for different region. Alternative Hypothesis: At least one of the region has different mean of numbers of customers. ii)Product Colour (Black, Blue, Pink, Red and White) Null Hypothesis: There is no significant difference between mean numbers of customer for different colours. Alternative Hypothesis: At least one of the colour has different mean of numbers of customers. Table 5 shows the output of one way ANOVA for region and colour Table 5: Output of one way ANOVA for region and colour AttributesLevelF StatisticP Value RegionQLD, WA, VIC, TAS and SA13.210.000 ColourBlack, Blue, Pink, Red and White2.170.070 From Table 5, we conclude that there is significant difference between mean number of customers in different region and there is no significant differences between mean number of customers according to colour. We can see that QLD has most number of customer compared to the other region. 12
Correlation Analysis: In this section we calculate the correlation coefficient for studying the association between the variables like Product Price, Profit and Number of customers. Table 6 represents the Pearson’s correlation matrix for the Product Price, Profit and Number of customers Table 6: Pearson’s correlation coefficient a for Product Price, Profit and Number of customers Product PriceProfitNumber of customers Product Price10.161-0.100 Profit0.1611-0.045 Number of customers-0.100-0.0451 From the correlation analysis, we can say that product price and number of customer are positively related with each other. Number of customers is negatively correlated with profit and product price. Regression Analysis: In this section, we try to fit the regression model to the total profit. We used number of customers as a predictor variable for total profit. Following Table 7 shows the output of regression analysis. Table 7: Output of regression analysis Regression Statistics Multiple R0.859341 R Square0.738467 Adjusted R Square0.738245 Standard Error5.371418 Observations1180 ANOVA dfSSMSFSignificance F Regression195968.32 95968.3 2 3326.21 20 Residual117833987.8128.8521 13
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
3 Total1179129956.1 Coefficient s Standard Errort StatP-valueLower 95% Upper 95% Intercept0.1545770.50766 0.30448 80.76081-0.841441.150595 No. of Customers2.3592330.040907 57.6733 202.2789742.439491 We observed that P value (Significance F) < 0.05, suggest that there is significant relation between total profit and number of customers. We also observed R2 as 0.74 which suggest that fitting is good. Slope of number of customers suggest that every customer gives on an average 2.3592 $ profit to the company. Recommendations to the company i)We observed that there is no comparative differences in profit percentages from profit analysis. But there is difference in total sales and total profit suggest that company should attract the more number of customers for increase the total profit. ii)Asthemeannumberofnewcustomersaresignificantlydifferentthanexisting customer, company should attract the existing customer. iii)As the mean number of customers who bought the free shipping product is significantly more than product on which shipping charges are levied suggesting that company should give the shipping free of charge so that number of customers increases. iv)We observed that wool products are more preferred than cotton products suggesting that company avail the wool products to every customer which demands. v)We observed that mean number of customers are more in QLD region than other suggesting that company should attract the customers from other region. An implementation plan based on the recommendations you have provided i)To attract the more customer, company should focused on the quality and service. 14
ii)Company can avail the most product at free shipping by appointing more staff for shipping department. iii)Company should keep the wool product in stock so that every customer got there desired products. iv)Company should use advertising boards, online advertisement to attract the customers. v)Company can attract the customer by offering them some offers like free delivery, cash back, etc. Conclusions We observed that company gaining about 7.95% profit overall. We can observed that there is no comparative difference in the different attributes. In the region, WA region giving the most profit percentage as 8.23% and QLD region generate 7.75% lowest among the all- region. We observed that averagely there is 11.81 customer for each products with standard deviation 3.82. We observed that only shipping type and material have significant association at 5% level of significance andcustomer type and material have significant association at 10% level of significancewhereas all other pairs are not associated. Average new customers are more than the existing customers. Mean number of customers for the products which are shipped freely is significantly more than products which has paid shipping.We can say that wool material products are more preferred than cotton as the number of customers for wool material products are significantly more than cotton material product. We conclude that there is significant difference between mean numbers of customers in different region and there is no significant differences between mean numbers of customers according to colour. We can see that QLD has most number of customer compared to the other region. From the correlation analysis, we can say that product price and number of customer are positively related with each other. Number of customers is negatively correlated with profit and product price. Regression analysis suggest that there is significant relation between 15
total profit and number of customers. We also observed R2as 0.74 which suggest that fitting is good.Slope of number of customers suggest that every customer gives on an average $2.3592 profit to the company. We have also given recommendation from the analysis and plan for it. List of References Berenson, M., Levine, D., Szabat, K.A. and Krehbiel, T.C., (2012).Basic business statistics: Concepts and applications.Pearson higher education AU. Bickel, P.J. and Doksum, K.A., (2015).Mathematical statistics: basic ideas and selected topics, volume I (Vol. 117). CRC Press. Black, K., (2009).Business statistics: Contemporary decision making.John Wiley & Sons. Casella, G. and Berger, R.L., (2002).Statistical inference (Vol. 2). Pacific Grove, CA: Duxbury. DeGroot, M.H. and Schervish, M.J., (2012).Probability and statistics.Pearson Education. Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., (2008).Business statistics. Pearson Education. Grus, J., (2015).Data science from scratch: first principles with python. " O'Reilly Media, Inc.". Hodges Jr, J.L. and Lehmann, E.L., (2005).Basic concepts of probability and statistics. Society for Industrial and Applied Mathematics. Karp, K.,Python for Data Science. Master of Science in Big Data, p.37. Kvanli, A.H., Pavur, R.J. and Guynes, C.S., (2000).Introduction to business statistics. Cincinnati, OH: South-Western. McKinney, W., (2012).Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.". 16
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Mendenhall, W. and Sincich, T., (1993).A second course in business statistics: Regression analysis.San Francisco: Dellen. Papoulis, A., (1990).Probability & statistics (Vol. 2). Englewood Cliffs: Prentice-Hall. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., (2011).Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), pp.2825- 2830. Pillers Dobler, Carolyn. "Mathematical statistics: Basic ideas and selected topics." (2002): 332-332. Ross, S.M., (2014).Introduction to probability and statistics for engineers and scientists. Academic Press. Schutt, R. and O'Neil, C., (2013).Doing data science: Straight talk from the frontline. " O'Reilly Media, Inc.". 17
Appendix (Python code) Here we attached some of the code used for this study. Appendix 1: Python code for importing data file importcsv # for importing csvmodule file_name ="SALESDATA.csv"# data file with open(filename, 'r') as csvfile:# reading csv file csvreader =csv.reader(csvfile) # creating a csv reader object Appendix 2: Python code for creating data frame import pandas as pd data= file_name df = pd.DataFrame(data, columns = [‘columns name’]) Appendix 3: Python code for basic statistic df[‘sample’].mean()# for mean df[‘sample’].std()# for standard deviation df[‘sample’].min()# for minimum observation df[‘sample’].max()# for maximum observation df[‘sample’].describe()# for summary statistics Appendix 4: Python code for independent two sample t-test assuming unequal variances import scipy.stats as stats stats.ttest_ind(var1, var2, equal_var=False) 18