Your contribution can guide someone’s learning journey. Share your
documents today.
Data Analytics: A Business Case Study from online shopping Executive Summary Today most of the world population preferred the online shopping which results into the exponential growth in eCommerce. eCommerce becoming very raising and popular business in every corner of the world. Online shopping is most popular from the all eCommerce business.In Today’s market we can see the words like Amazon, eBay, etc. which very common in day to day life. Service provider attract the customer by giving offers to the customers. Growth in the online shopping creates the opportunity and challenges to the service provider. The main challenges in the online shopping is competition and customer total satisfaction. We have data sets regarding to Cloths (Shirt, Trouser and Track Suit) for the 2000 products.We summarised the total sale amount and total profit in profit analysis. We carried the chi-squared test for association of attributes (shipping type, customer type, region, brand and material). We analyzed the mean of number of customers for shipping type, customer type, region, brand and material by the two sample t test and one way ANOVA. We also carried correlation analysis. Recommendation and plan is also provided from the analysis. 1
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Table of Contents Sr. No.TopicPage No. 1List of Abbreviations and assumptions made3 2Introduction – What is the problem?4 3Research Methodology5 4Analytical Findings5 5Recommendations to the company11 6 An implementation plan based on the recommendations you have provided 12 7Conclusion12 8List of References14 9Appendix16 2
List of Abbreviations and assumptions made Max: Maximum Min: Minimum QLD: Queensland TAS: Tasmania VIC: Victoria WA: Western Australia 3
Introduction – What is the problem? Today most of the world population preferred the online shopping which results into the exponential growth in eCommerce. eCommerce becoming very raising and popular business in every corner of the world. Online shopping is most popular from the all eCommerce business.In Today’s market we can see the words like Amazon, eBay, etc. which very common in day to day life. Service provider attract the customer by giving offers to the customers. Growth in the online shopping creates the opportunity and challenges to the service provider. The main challenges in the online shopping is competition and customer total satisfaction. About Data: We have data sets regarding to Cloths (Shirt, Trouser and Track Suit) for the 2000 products. We considered the following attributes i)Product Name ii)Product Price (in $) iii)Sale Price (in $) iv)Profit (in $) v)Number of customers vi)Shipping Type (Free or Paid) vii)Customer Type (New or Existing) viii)Region (QLD, WA, VIC, TAS) ix)Product Brand x)Product Material We define following variables for our analysis from the above variables 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Total Monthly sale amount (in $) = Sale Price (in $) × Number of customers Total monthly profit (in $) = Profit (in $) × Number of customers Project Problem: We are interested to know the following things i)Profit analysis by shipping type, customer type, region, brand and material. ii)Whether there is any association between shipping type, customer type, region, brand and material. iii)Whether the number of customers is significantly different for shipping type, customer type, region, brand and material. iv)Correlation analysis of variables Research Methodology Any data analysis is stronger if it is supported by statistical tools and techniques. We summarised the total sale amount and total profit in profit analysis. We carried the chi- squared test for association of attributes (shipping type, customer type, region, brand and material). We analyzed the mean of number of customers for shipping type, customer type, region, brand and material by the two sample t test and one way ANOVA. We also carried correlation analysis. We used Python and MS-Excel for data analysis. We used Grus (2015), McKinney (2012), Pedregosa et al. (2011) and Schutt and O'Neil (2013). Analytical Findings Profit Analysis: 5
In profit analysis, we have reported the total sales amount, total profit, profit percentage for the shipping type, customer type, region, brand and material.We referred Berenson et al. (2012), Black (2009), Groebner et al. (2008), Kvanli et al. (2000) and Mendenhall and Sincich (1993).Table 1 summarizes the profit analysis according to shipping type, customer type, region, brand and material. Table 1: Profit analysis according to different attributes AttributesShipping TypeSum of Total Sale Sum of Total ProfitProfit % Shipping TypeFree6615258866413.40% Paid127609417117813.41% Customer TypeExisting87934911724113.33% New105827014260113.47% RegionQLD3707424974113.42% TAS2040552722513.34% VIC79280210608613.38% WA5700217679013.47% BrandAdidas3870144552911.76% Arrow3933056087415.48% Nike3718114647612.50% Polo3779415275513.96% Woodland4075485420913.30% MaterialCotton7091799421613.29% Other6018608107113.47% Wool6265808455513.49% Total193762025984213.41% Company getting 13.41% profit overall. There is significant changes in the profit by different attributes (expect brand). In the brand, company gets 15.48% profit in Arrow whereas earned only 11.76 % profit earned on the Adidas. Total sales amount is bigger for paid shipping type. Total sales revenue is comparatively low for existing customer and customers from TAS region. 6
Descriptive statistics: We used the well-known books for this section such as Bickeland Doksum (2015), Casella and Berger (2002), DeGroot and Schervish (2012), Hodges Jr and Lehmann (2005), Papoulis (1990), Pillers (2002) and Ross (2014). Profit is mainly depend on the number of customers who bought the product. In this section, we reported the summary statistics for number of customer for the shipping type, customer type, region, brand and material. In Table 2, we have given sample size, mean, standard deviation, minimum and maximum for number of customers for shipping type, customer type, region, brand and material. We can observed that averagely there is 37.99 customer for each products with standard deviation 6.61.We can observed one interesting thing is that average number of customers for free shipping is more than paid shipping. Table 2: Summary statistics for number of customers AttributesLevelSizeMeanSDMinMax Shipping TypeFree63641.09 6.6 92459 Paid136436.54 6.0 61556 Customer TypeExisting90837.97 6.4 71557 New109238.00 6.7 31959 RegionQLD40036.64 6.1 12256 TAS21637.32 6.6 31955 VIC77939.58 6.5 32059 WA60537.07 6.6 31558 BrandAdidas40237.96 6.5 01955 Arrow40138.01 6.8 82059 Nike38438.28 6.8 31556 Polo39137.816.42156 7
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
5 Woodland42237.90 6.4 32256 MaterialCotton66538.08 6.7 21559 Other66237.91 6.3 32156 Wool67337.98 6.7 82258 Total200037.99 6.6 11559 Chi-square test for association: In this section we test whether there is any association between attributes or not. Following Table 3 shows the chi-square statistic and p-value for chi-square test of testing association. In this test we have null hypothesis that there is no significant association between two attributes and alternative hypothesis is there is significant association between two attributes. We test the association between following pair of attributes i)shipping type and customer type. ii)shipping type and region. iii)shipping type and brand. iv)shipping type and material. v)customer type and region. vi)customer type and brand. vii)customer type and material. viii)region and brand. ix)region and material. x)brand and material. Table 3: Chi-squared test for association Pairs of attributesChi-Square StatisticP-Value 8
shipping type and customer type.0.0050.943 shipping type and region.0.0220.999 shipping type and brand.2.4170.660 shipping type and material.8.4890.014 customer type and region.1.5860.663 customer type and brand.3.5500.470 customer type and material.2.1250.346 region and brand.5.8030.926 region and material.10.0340.123 brand and material.6.1330.632 We observed that only sipping type and material have significant association at 5% level of significance whereas all other pairs are not associated. Two Sample t-test: We used independent two sample t test for testing the difference between mean numbers of customer for shipping type (free and paid) and customer type (new and existing) . For shipping type, we test the null hypothesis that the mean numbers of customer who got free shipping is same as mean number of customers who got paid shipping. For customer type, we test the null hypothesis that the mean numbers of customer who are new is same as existing customers. Following Table 4, shows the result of two sample t test for shipping type and customer type. Table 4 includes test statistics, degrees of freedom and p value. Table 4: Two sample independent test for shipping type and customer type AttributesLevelsTest Statistic P value Shipping TypeFree and Paid14.580.000 9
Customer Type New and Existing -0.130.899 From Table 4, we can see that P-value of shipping type is less than 5% suggest that there is significant difference between the mean numbers of customer who got free shipping and got paid shipping.As if the shipping is free, customer got interested in buying the item than the items for which shipping is paid. There is no significant difference between mean numbers of new customer and existing customer. One way ANOVA: In this section we test whether there is any significant difference between means of different level of i)Region (QLD, TAS, VIC and WA) ii)Brand (Adidas, Arrow, Nike, Polo and Woodland) iii)Material (Cotton, Wool and other) We used one way ANOVA for this purpose. Here our null hypothesis that the different levels has same mean and alternative hypothesis is that at least one level has different mean. Following Table 5 shows the output of one way ANOVA for region, brand and material Table 5: Output of one way ANOVA for region, brand and material AttributesLevelF StatisticP Value RegionQLD, TAS, VIC and WA26.130.000 BrandAdidas, Arrow, Nike, Polo and Woodland0.280.889 MaterialCotton, Wool and other0.120.891 From Table 5, we conclude that there is significant difference between mean numbers of customer in different region. We can see that VIC has most numbers of customer 10
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
compared to the other region. We claim that there is no significant difference between mean numbers of customer for brand and material. Correlation Analysis: In this section we calculate the correlation coefficient for studying the association between the variables like Product Price, Profit and Numbers of customer. Table 6 represents the Pearson’s correlation matrix for the Product Price, Profit and Numbers of customer. Table 6: Pearson’s correlation coefficient a for Product Price, Profit and Numbers of customer Product PriceProfitNumber of customers Product Price10.430.001 Profit0.431-0.003 Number of customers0.001-0.0031 From the correlation analysis, we can say that product price and numbers of customer moderate positive correlation where as there is negligible correlation (zero correlation) between product price and numbers of customer, profit and numbers of customer. Regression Analysis: We used simple linear regression model for predicting the monthly sale using number of customers as predictor variable. Table 7 represents the F Statistics, P value, R2and regression coefficients of simple linear regression. Table 8: Output of Regression Analysis F Statistic7462.109 P Value0.000 R20.789 Intercept-3.535 Slope25.596 11
We observed that P Value =0.000 suggests that there is significant relationship between total monthly sale and number of customers who bought the laptops. We also observed R2as 0.789 suggests that model fitting is good and adequate. We fitted the following straight line as Total sale (in $) = -3.535 + 25.596 × Number of Customers Recommendations to the company i)In the brand, company gets 15.48% profit in Arrow whereas earned only 11.76 % profit earned on the Adidas. So company needs to marketing strategies for Arrow brand which gives you more profit. ii)We observed that there is significant difference between mean numbers of customer who got free delivery and who paid for delivery. Mean number of customers who got free delivery is more than who paid for delivery. So company should provide free shipping so that customer gets attracted to buy. iii)We observed that there is significant difference in the mean of numbers of customer for different region. We observed that VIC has most numbers of customer compared to the other region. So company should concentrate on attracting the customers from other region. An implementation plan based on the recommendations you have provided i)Company should appoint more employee in shipping department so that many products got free shipping. ii)Company can adopt the marketing strategies which are in functioning in VIC region so that number of customers in other region increases. iii)Company can attract the customer by offering them some offers like free delivery, cash back, etc. Conclusions 12
Company getting 13.41% profit overall. There is no significant changes in the profit by different attributes (expect brand). In the brand, company gets 15.48% profit in Arrow whereas earned only 11.76 % profit earned on the Adidas. Total sales amount is bigger for paid shipping type. Total sales revenue is comparatively low for existing customers and customers from TAS region. We can observed that averagely there is 37.99 customers for each products with standard deviation 6.61.We can observed one interesting thing is that average number of customers for free shipping is more than paid shipping. We observed that only shipping type and material have significant association at 5% level of significance whereas all other pairs are not associated. There is significant difference between the mean numbers of customer who got free shipping and got paid shipping. As if the shipping is free, customer got interested in buying the item than the items for which shipping is paid. There is no significant difference between mean number of new customers and existing customers. There is significant difference between mean numbers of customer in different region. We can see that VIC has most numbers of customer compared to the other region. We claim that there is no significant difference between mean number of customer for brand and material. We observed that there is significant relationship between number of customers and total sale. We have also provided recommendation and plan to the company. 13
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
List of References Berenson, M., Levine, D., Szabat, K.A. and Krehbiel, T.C., (2012).Basic business statistics: Concepts and applications.Pearson higher education AU. Bickel, P.J. and Doksum, K.A., (2015).Mathematical statistics: basic ideas and selected topics, volume I (Vol. 117). CRC Press. Black, K., (2009).Business statistics: Contemporary decision making.John Wiley & Sons. Casella, G. and Berger, R.L., (2002).Statistical inference (Vol. 2). Pacific Grove, CA: Duxbury. DeGroot, M.H. and Schervish, M.J., (2012).Probability and statistics.Pearson Education. Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., (2008).Business statistics. Pearson Education. Grus, J., (2015).Data science from scratch: first principles with python. " O'Reilly Media, Inc.". Hodges Jr, J.L. and Lehmann, E.L., (2005).Basic concepts of probability and statistics. Society for Industrial and Applied Mathematics. Kvanli, A.H., Pavur, R.J. and Guynes, C.S., (2000).Introduction to business statistics. Cincinnati, OH: South-Western. McKinney, W., (2012).Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.". Mendenhall, W. and Sincich, T., (1993).A second course in business statistics: Regression analysis.San Francisco: Dellen. Papoulis, A., (1990).Probability & statistics (Vol. 2). Englewood Cliffs: Prentice-Hall. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., (2011).Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), pp.2825- 2830. Pillers Dobler, Carolyn. "Mathematical statistics: Basic ideas and selected topics." (2002): 332-332. 14
Ross, S.M., (2014).Introduction to probability and statistics for engineers and scientists. Academic Press. Schutt, R. and O'Neil, C., (2013).Doing data science: Straight talk from the frontline. " O'Reilly Media, Inc.". 15
Appendix (Python code) Here we attached some of the code used for this study. Appendix 1: Python code for importing data file importcsv # for importing csvmodule file_name ="SALESDATA1.csv"# data file with open(filename, 'r') as csvfile:# reading csv file csvreader =csv.reader(csvfile) # creating a csv reader object Appendix 2: Python code for creating data frame import pandas as pd data= file_name df = pd.DataFrame(data, columns = [‘columns name’]) Appendix 3: Python code for basic statistic df[‘sample’].mean()# for mean df[‘sample’].std()# for standard deviation df[‘sample’].min()# for minimum observation df[‘sample’].max()# for maximum observation df[‘sample’].describe()# for summary statistics Appendix 4: Python code for independent two sample t-test assuming unequal variances import scipy.stats as stats stats.ttest_ind(var1, var2, equal_var=False) Appendix 5: Python code for one way ANOVA import scipy.stats as stats stats.f_oneway(var1, var2, var3, var4, var5) Appendix 6: Python code for regression import statsmodels.api as sm y = pd.DataFrame(data, columns= [‘Total_Sale’]) X = pd.DataFrame(data, columns= [‘Number_of_Customers’]) model = sm.OLS(y, X)# y is monthly sale and X is number of customers model.summary() 16