ISYS3374 - Business Analytics Assignment

Added on - 02 Jun 2021

  • 15


  • 3290


  • 15


  • 0


Trusted by +2 million users,
1000+ happy students everyday
Showing pages 1 to 4 of 15 pages
RMIT University1-Course/Unit codeAssignmentnumberAssignmentdue dateGroup/Session name (if applicable)ISYS3374Assignment331May 2020Course/Unit nameProgram titleBusiness Analytics (2010)Master Business of Information TechnologyLecturer/Teacher’s nameTutor / Marker’s name (if applicable)Dr Babak AbbasiDr Joerin MotavallianThis statement should be completed and signed by the student(s) participating in preparation of theassignment.Declaration and statement of authorship:1.I/we hold a copy of this assignment, which can be produced if the original is lost/damaged.2.This assignment is my/our original work and no part of it has been copied from any other student’s work or from any othersource except where due acknowledgment is made.3.No part of this assignment has been written for me/us by any other person except where such collaboration has been authorisedby the lecturer/teacher concerned and is clearly acknowledged in the assignment.4.I/we have not previously submitted or currently submitting this work for any other course/unit.5.This work may be reproduced and/or communicated for the purpose of detecting plagiarism.6.I/we give permission for a copy of my/our marked work to be retained by the School for review by external examiners.7.I/we understand that plagiarism is the presentation of the work, idea or creation of another person as though it is your own. It isa form of cheating and is a very serious academic offence that may lead to expulsion from the University. Plagiarised materialcan be drawn from, and presented in, written, graphic and visual form, including electronic data, and oral presentations.Plagiarism occurs when the origin of the material used is not appropriately cited.8.Enabling plagiarism is the act of assisting or allowing another person to plagiarise or to copy your work.Family nameGiven nameStudent numberStudent signatureDateLYTRONG TIENS379042531 May2020Further information relating to the penalties for plagiarism, which range from a notation on your student file to expulsion from theUniversity, is contained inRegulation 6.1.1 ‘Student Discipline’;ID=11jgnnjgg70yandAcademic Policy:‘Plagiarism’;ID=sg4yfqzod48g1.Assessor’s commentsGradeSchool date stamp(Office use only)
SECTION A:Question 1:Overfitting is when a model is too closely or exactly corresponding to a particular sample datasetwhich may cause that model not able to match with other sets of data, and therefore would providesome inaccurate prediction. However, this is easy to avoid by some following techniques:-Using just independent variables that have close and meaningful relationship with dependentvariable to lessen the risk of-Using more complex models like linear regression models or quadratic models to test if themodel can generate accuracy value by evaluating its performance on a different set of dataand base on that can approximate the typical hidden data that could cause overfitting to themodel.-Putting more data inside the sample dataset to increase the accuracy of testing.Question 2:Predictive analytics has been deployed to use in many industries, especially in retailing industry,where it is considered as the most useful tool help forecasting the stocks and improving thecustomer experience. Retailingbusinesses can have a better plan of stocking to avoid the over-stockor out-of-stock problem. For instance, in the peak season like Christmas or Black Friday when thedemands of shopping increase, predictive analytics could use historical data to predict the amount ofgoods could be sold to avoid out-of-stock in stores. Predictive analytics also provides a better insightof customer behavior when analyzing the shopping preferences or the buying history in order topredict new opportunities to engage with their customers, andwhen it comes to a new marketing orsale campaign predictiveanalytics could help to build-up a better personalized shopping experience.Question 3:Missing not at random values is when that missing value has relationship with the attribute. To dealwith MNAR, there are many methods to do, but the most popular is to use multiple-regressionanalysis to estimate a missing value. By using this technique to figure out the missing SUS scores.Regression substitution could help to predict the missing value from the other values of the samecategory. Example of not missing at random values is when doing an income survey, the people whohave higher income tend to hide their true income or don’t want to provide the answer causemissing not at random in the final report.Question 4:To develop the logistic regression model, the variable X1can be replaced by two dummy variables, inwhich each would correspond to one of the levels of the X1and have binary values of one and zero.For instance, X1Aand X1Bcan be used for X1. When X1Avalue is one and X1Bis zero the category wouldbe low; or when X1Avalue is zero and X1Bis one the category would be average; and when both havethe value of zero the category would be high. And the same rule is applied for X2with three dummyvariables. Based on that, logistic regression model could be developed with five coefficients (two forX1and three for X2).2
SECTION B:Question 5:Question 5 - Part a:SUMMARY OUTPUTRegression StatisticsMultiple R0.060745R Square0.00369Adjusted R Square-0.00436Standard Error295.8267Observations500ANOVAdfSSMSFSignificanceFRegression4160435.840108.940.4583170.766334Residual4954331916487513.46Total49943479600CoefficientsStandardErrort StatP-valueLower95%Upper95%Lower95.0%Upper95.0%Intercept1164.2048.64223.931.12E-841068.61259.771068.631259.77Age (blanks meanswe do not know theirage)0.840.76721.0920.275-0.66932.345-0.6692.345Gender (Male is 1)-7.6526.511-0.2880.772-59.74044.436-59.7444.436Family size-6.047.7080-0.7830.433-21.1869.102-21.1869.102Membership (withmembership is 1)-1.3726.510-0.0510.958-53.45850.715-53.45850.715Spent amount = 1,164.2 + 8.84*Age – 7.65*Gender – 6.04*Family size – 1.37*Membership1,164.2 is the constant amount spent that is not depending on the variables. The coefficients valuemeans if the variable is 1, the spending will be impacted by this amount.This model’s accuracy is low because of low R2 and high sig.Question 5 - Part b:This model should use more quantitative variables instead of quality variables to increase accuracy.The significant variances should be removed to lower the P-value within the accepted range.These following charts represent that there is no relationship between the spending amount andproduct types and discount card type, which the model can remove without changing the accuracy.3
Question 6:Question 6 – Part a:The amount of time for each repair person is calculated as in Figure 1, which shows that Bob has thehighest amount of repair time at 56%, John is at the second place with 35% and James has the leastwith 9%.Bob56%James9%John35%The services that had been done in the morning is mostly higher than in the afternoon by 11%. Thereare 8% of the services unknow due to data missing recorded in the time of service.4
You’re reading a preview
Preview Documents

To View Complete Document

Click the button to download
Subscribe to our plans

Download This Document