Deakin University MIS772 Predictive Analytics Assignment A2 Report

Verified

Added on  2022/12/18

|14
|3341
|1
Report
AI Summary
This report details a student's analysis of the Zomato restaurant dataset for MIS772 Predictive Analytics, focusing on Bangalore Food Assist (BFA). The assignment involves data exploration, clustering, and the development of an estimation model using RapidMiner. The student investigates restaurant reviews, identifies anomalies, and builds models using Random Forest and neural nets classifiers to predict restaurant ratings. The report includes data preprocessing, model evaluation using MAE, RMSE, and correlation, and provides insights into table booking and online ordering strategies. The objective is to create a data mining strategy to determine if restaurants should provide online ordering and table booking services. The student explores attributes like restaurant names, types, locations, menus, and customer reviews, to build a model for BFA to predict restaurant ratings. The report also highlights the use of various RapidMiner operators such as Read CSV, Normalize, Filter Examples, and the creation of correlation matrices and decision trees to gain insights from the data. The goal is to build a model with high precision, measured by MAE, RMSE, and correlation, to assist restaurant owners in understanding their expected ratings on the Zomato website.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Assignment A2: Text + Clustering + Estimation
Student
Name
(as per record) Student No Student number
My other group members A2
Group No
As per CloudDeakin group
number
Team
Names
(as per record) Student Nos Student number
(as per record) Student number
(as per record) Student number
Exceptional Meets expectations Issues noted Improve Unacceptable
Exec
Report
Use this area to self-assess your submission
Explore
Attributes
Be realistic as we will find problems in your work that you may not be aware of
Discover
Relationships
Create
Models
Evaluate &
Improve
Provide
Solution
Research &
Extend
Brief
Comments
Read these notes as we are really trying to help you out!
Remember: If it is not in this report, it does not exist and does not get marked!
Assume that markers could miss some important aspects of your submission unless presented clearly, or when
you deviate from the structure of this template (for which you will be penalised). So be clear, number tables,
charts and screen shots used as evidence, annotate all visuals, cross-reference your analysis with evidence.
Use the A2 Word template to prepare this report. Submit it in PDF format to avoid its accidental reformatting.
Submit all RM processes (.RMP files only – not the whole project directory or data) in a separate ZIP archive.
Only work submitted via CloudDeakin assignment box will be marked (not via email or any other way).
Ensure that the report is readable and the font is no smaller than Arial 10 points. Include only the most relevant
and significant results for your analysis and recommendations.
You will be able to submit your work as many times until deadline. We will mark the last complete submission,
i.e. the report in PDF and the ZIP-ped RapidMiner processes.
Go over this checklist: Is this your document? Does it report your work and your work only? Is this the correct
unit, assignment, year and trimester? Is your name entered above? Is the group number included and is it
correct? Are names of your group members entered as well? Are all pages included? Are all report sections
within the required page limit?
Then after the submission – check these: Was it lodged on time? Has the PDF report been submitted? Has the
Zip archive of RMP files been submitted? Can you retrieve and reopen both back from your submission folder?
We will be checking your work for plagiarism! If any parts of your work (report, screen shots or RM
processes) bear any resemblance to another students’ work, or by you for another unit, or anything
written by others without acknowledgement (e.g. on the web), it will be treated as plagiarism.
Total
1 of 14
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Executive summary (one page)
The fundamental point of this task to build up the Zomato nourishment food information
mining investigation systems they can utilized for the quick digger programming execution. The
examining on the Zomato food dataset they can contains the diverse activity information fields they can
handled. This present venture's point is spins around building up an information mining technique, which
guarantees to help during the time spent deciding if the cafés are required to give a few administrations
in particular the online super requesting just as booking the table to its clients for the Bangalore
nourishment help (BFA). BFA alludes to an Indian organization which is associated with the Zomato
café search and the disclosure site. By and large, BFA causes the organizations to give the example
audits of about 48000 Zomato food order booking system they can contained the data fields, which
involves the accompanying characteristics of the data attributes list,
Restaurant name
Restaurant sort
Contact Number
Location
Address
Neighbourhood
Rate of visit
Menu
Cuisine and kind of dinners
Average supper cost for the couple.
Number of votes cast
Liked dishes
Reviews
The point of BFA is to accumulate couple of initial bits of knowledge of the ordering
food and delivery service in Bangalore, for investigating and to tidy up the convey their audits so as to
inspect and make eatery table's classifier for their table booking, for requesting on the web and to
diminish the orders that aren't right. The apparatus named Rapid Miner is used for the advancement of
BFA's information mining technique, and during the time spent information mining this device is
profoundly basic. The eatery surveys are investigated and tidied up with the assistance of Set job,
Normalize, Selecting Attributes on a Rapid Mining information mining method analysis
implementation. .
2 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
For creating and investigating a classifier for the BFA eatery for Booking a table administration and
online dinner requesting administration, the two groupings in particular the Random timberland and neural nets
arrangements are utilized to diminish an inappropriate order. In this way, this report will quickly talk about and
examine the previously mentioned zones. The breaking down on the outcome to be confirmed on the diagram
and graph position.
3 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Data exploration and relationships - Clustering in Rapid Miner (one page)
Here, a model is made on the Rapid Miner with the assistance of the given information of
the café. The model creation uses the Random Forest and the generalized linear regression classifiers. The
underneath chart portrays the creation model for these two classifiers.
For making a model, at first it is required to incorporate the read CSV administrator for perusing
the information of the eatery. Next, it is required to supplant the missing qualities with the assistance of
the standardize administrator, as it may affect the consequences of our model. At that point, with the
assistance of the Random Forest and generalized linear regression classifiers administrator characterize
the mark credits to be anticipated. Further, the required characteristics are chosen for including it to
foresee the properties. The accompanying figure portrays the choice tree.
During the formation of the model, the exactness parameters are chosen on the classifier,
promotion it is used for estimating the indicator's precision and their presentation. Different parameters
and profundity and set as default. For demonstrating a viable model for Zomato eatery information BFA,
the Random Forest is used. The underneath figure delineates the yield of the generalized linear regression
classifiers calculation
The following list represents the attributes for exploring the given data:
1) Restaurant name
2) Restaurant type
3) Location
4) Address
5) Menu
6) Cuisine and type of meals
7) Reviews etc.
The following figure represents the data exploration and preparation.
4 of 14
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Data pre-processing on the correlation matrix of the result is,
5 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
6 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Data exploration and clean up - Anomalies in RapidMiner (one page)
Here, the connections and the information change will be found that is given in the eatery
information, with the assistance of Rapid Miner. A relationship grid must be made for distinguishing the
association of the information and its change. The administrators like Filter Examples, Read CSV, Set jobs,
connection framework and select credits are utilized to make the relationship network. The underneath figure
portrays the classification connections on the data mining method.
For making a model, at first it is required to incorporate the read CSV administrator for perusing the
information of the eatery. Next, it is required to analysing data the missing qualities with the assistance of the
standardize administrator, as it may affect the aftereffects of our model. At that point, with the assistance of the
Random Forest and generalized linear regression administrator characterize the mark ascribes to be anticipated.
Further, the required characteristics are chosen for including it to anticipate the properties. The accompanying
figure portrays the decision tree.
7 of 14
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
During the production of the model, the exactness parameters are chosen on the classifier,
advertisement it is used for estimating the indicator's precision and their presentation.
Different parameters and profundity and set as default. For demonstrating a successful model
for Zomato eatery information BFA, the Random Forest is used. The beneath figure shows the
yield. The data analysing of data mining to be calculate on the attributes accuracy value of the
result is displayed on the below,
.
8 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Create a Model(s) in RapidMiner (two pages / page 1)
Here, a model is created on the Rapid Miner with the help of the given data of the restaurant. The model
creation utilizes the Random Forest and the neural nets classifiers. The below diagram depicts the creation
model for these two classifiers.
The name credits are required to make the connection lattice that is determined for the situation
concentrate, for example, the book table. At that point, the suggestions are assessed. The accompanying figure
outlines the relationship lattice, during the creation of the model, the accuracy parameters are selected on the
classifier, ad it is utilized for measuring the predictor’s accuracy and their performance. The other parameters
and depth and set as default. For proving an effective model for Zomato restaurant data BFA, the Random
Forest is utilized. The below figure illustrates the output.
9 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Create a Model(s) in Rapid Miner (two pages / page 2)
The model that is made uses two or three classifiers called the Random Forest classifier and
the generalized linear regression classifier, where the Random Forest utilizes four criteria; where the
precision criteria is chosen for giving the model's exact expectation, which is later used for deciding the
two administrations, thus it gives successful results for the clients of the BFA organization. This is then
connected on the Example sets with the numerical qualities, anyway just on the ostensible properties. It
conveys the yields like the Random Forest and the model set.
Both, standard deviation and fantastic execution estimation of the made model is conceivable
with the cross-approval execution, which detects the information. The previously mentioned exhibition
of the café information and its estimation is affected by over fitting. The Kaggle site is utilized for
breaking down the information arrangement. The accompanying figure delineates the made model's
arrangement
10 of 14
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Evaluate and Improve the Model(s) in Rapid Miner (two pages / page 1)
This administrator ought to be utilized for execution assessment of relapse undertakings as it were. Numerous
other exhibition assessment administrators are additionally accessible in Rapid Miner for example the
Performance administrator, Performance (polynomial Classification) administrator, and Performance
(Classification) administrator and so on. The Performance (generalized linear Regression) administrator is
utilized with relapse undertakings as it were. Then again, the Performance administrator consequently decides
the learning assignment type and computes the most widely recognized criteria for that type. You can utilize the
Performance (User-Based) administrator in the event that you need to compose your very own presentation
measure.
Relapse is a strategy utilized for numerical forecast and it is a factual measure that endeavor to
decide the quality of the connection between one ward variable ( for example the mark trait) and a progression
of other changing factors known as autonomous factors (customary qualities). Much the same as Classification
is utilized for clear cut marks, Regression is utilized for anticipating a consistent worth. For instance, we may
wish to foresee the pay of college graduates with 5 years of work understanding, or the potential offers of
another item given its cost. Relapse is regularly used to decide how much explicit factors, for example, the cost
of an item, book, order, food, specific enterprises or areas impact the value development of an advantage. For
assessing the factual exhibition of a relapse model the informational collection ought to be named for example it
ought to have a characteristic with name job and a property with forecast job. The mark property stores the
genuine watched values though the expectation trait stores the estimations of name anticipated by the relapse
model under exchange.
11 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Evaluate and Improve the Model(s) in RapidMiner (two pages / page 2)
The 'Polynomial' informational index is stacked utilizing the Retrieve administrator. The Filter
Example Range administrator is connected on it. The main model parameter of the Filter Example Range
parameter is set to 1 and the last model parameter is set to 100. In this way the initial 100 instances of the
'Polynomial' informational collection are chosen. The Linear Regression administrator is connected on it with
default estimations everything being equal. The relapse model produced by the generalized Linear Regression
administrator is connected on the last 100 instances of the 'Polynomial' informational collection utilizing the Apply
Model administrator. Marked information from the Apply Model administrator is given to the Performance
(Regression) administrator. The total mistake and forecast normal parameters are set to genuine. In this way the
Performance Vector produced by the Performance (Regression) administrator has data with respect to the total
mistake and forecast normal in the named informational index. The supreme blunder is determined by including
the distinction of all the anticipated qualities from real estimations of the mark characteristic, and separating this
whole by the complete number of forecasts. The expectation normal is determined by including all the genuine
mark esteems and partitioning this entirety by the all-out number of models. You can confirm this from the
outcomes in the Results Workspace.
12 of 14
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Provide an Integrated Solution in Rapid Miner (one page)
Here, the Rapid digger based made model will be sent with the assistance of a cross-approval
administrator. This administrator helps in deciding the factual presentation of the made model. Cross-
approval involves two or three sub-forms in particular the Training Processes and the Testing Processes,
and these procedures are alluded as the information port and the yield port for conveying the yield of the
expectation model prepared on the total model set.
The model that is made uses a few classifiers called the Random Forest classifier and the
generalized linear regression classifier, where the Random Forest utilizes four criteria; where the
exactness criteria is chosen for giving the model's precise expectation, which is later used for deciding the
two administrations, consequently it gives compelling results for the clients of the BFA organization. This
is then connected on the Example sets with the numerical properties, anyway just on the ostensible
qualities. It conveys the yields like the Random Forest and the model set.
Both, standard deviation and astounding execution estimation of the made model is
conceivable with the cross-approval execution, which detects the information. The previously mentioned
exhibition of the café information and its estimation is affected by over fitting. The Cagle site is utilized
for dissecting the information readiness. The accompanying figure portrays the made model's
organization.
13 of 14
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
MIS772 Predictive Analytics (2019 T2) Individual Assignment A2 / All Workshops
Further Research and Extensions in RM (one page)
Here, the given information will be inquired about and its augmentations will be resolved. The
given informational index of the eatery includes different missing qualities, alongside the deficient
properties, in this manner in Rapid Miner such traits are supplanted and moved to the information
investigation and readiness segment. Afterward, a few principle qualities (i.e., book _ table and Order
_ on the web) are utilized for deciding and finding the administrations, for example, Book a table and
Online dinner requesting, which are the normal outcomes for Zomato eatery and its clients relying
upon its café surveys.
The Random Forest and generalized linear regression classifiers are utilized for deciding the
normal consequences of the Zomato café relying upon the two administrations. Then again, these
classifiers effectively give powerful outcomes for the improvement of Zomato café's information
mining.
The utilization of Random Forest classifier demonstrates the given information's precision,
and it plainly demonstrates that BFA is keen on the two administrations and gives successful results
to the café. What's more, this is resolved with the assistance of the exhibition administrator. For
giving the information which demonstrates the BFA is keen on the result of the two administrations,
the generalized linear regression likewise used, where it demonstrates the profoundly successful
results for the Zomato café. Next, it has even utilized the presentation vector for deciding the
precision and kappa esteem, which is recognized as 99.03% and 83.50%separately.
Contingent upon the made arrangement models, it is used for giving viable outcomes to the
café information. It even gives powerful outcome i.e., book a table administration and the online feast
request administration as the advantageous administrations for its clients. Such administrations will in
general give powerful results for any eatery, and it is recognized by the Random Forest model. As
indicated by the consequences of the Random Forest model, 99.03% of precision is shown and with
an unmistakable Random Forest graph containing the two administrations which are perfect for
Zomato eatery, contingent upon its surveys. The made models are later assesses and approves the
booking table administration, Zomato survey, and the request online administration. The exhibition
and cross-approval models are seen for checking the general perception of the make model. Here, the
client has not directed an autonomous research related to informational collection examination for
deciding if our forecasts could affirm or broaden the outcomes that were distributed already on the
data mining analysing methods.
14 of 14
chevron_up_icon
1 out of 14
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]