Deakin MIS772: Predictive Analytics Assignment A2 Data Exploration

Verified

Added on 2022/12/16

AI Summary

This report details the data exploration and preparation process undertaken for MIS772 Predictive Analytics Assignment A2. The assignment focuses on analyzing wine data using RapidMiner. The student begins by addressing the problem statement, which involves exploring wine tasting data to identify similarities between wines and estimate ratings. The report describes the steps taken to downsize the sample data and handle missing values, followed by the application of clustering techniques to group wine varieties and wineries. The executive summary provides context on the Australian wine market and the client's need to predict market demand. The data exploration section outlines the use of RapidMiner operators for data cleaning and attribute selection. The analysis focuses on identifying similar wine categories using clustering, with Pinot Noir being identified as the most similar. The report includes references to relevant research papers and demonstrates the student's ability to apply data mining techniques to address a business problem.

MIS772 Predictive Analytics (2019 T1) Individual Assignment A2-LP3 / Workshops M1, M2T1
Assignment A2-LP3: Data Exploration and Preparation
Student
Name
(as per record) Student No Student number
My other group members A2
Group No
As per CloudDeakin group
number
Team
Names
(as per record) Student Nos Student number
(as per record) Student number
(as per record) Student number
Exceptional Meets expectations Issues noted Improve Unacceptable
Problem
Statement
Explore &
Prepare Data
Brief
Comments Read these notes as we are really trying to help you out!
Remember: If it is not in this report, it does not exist and does not get marked!
You can use the above form to estimate the expected mark against the rubric (see the assignment “info”
document). Be realistic and note that we will find many problems you may not be aware of.
Assume that markers may be tired when assessing your work and they may miss some important aspects of
your submission when not presented clearly, or when you deviate from the structure of this template, or if you
do not include them in your report. So be clear, number all tables, charts and screen shots used as evidence,
describe all visuals, cross-reference your analysis with evidence.
Submit this report in PDF format to avoid accidental reformatting of the content.
Submit all RapidMiner processes (.RMP files) in a separate ZIP archive, so that if there is any doubt we could
load your work and replicate your results (we will not do this to find missing report parts).
Ensure that the report is readable and the font is no smaller than Arial 10 points. In the report include only the
most significant results for your analysis and recommendations.
You will be able to submit your work once only so make sure you get it right – check these before posting on
CloudDeakin: Is this your document? Is this the correct unit, assignment, year and trimester? Is your name
entered above? Is the group number included and is it correct? Are names of your group members entered as
well? Are all pages included? Does it all fit into the required page limit? Have you zipped all RapidMiner files
(.RMP files)? Is the report contents yours alone?
Then after the submission – check these: Has the PDF report been submitted? Has the Zip archive of RMP
files been submitted? Can you retrieve and reopen both back from your submission folder?
Note that the late penalty will be calculated on the date and time of the last submitted file.
Finally, as all reports will be inspected for plagiarism, ensure that your analysis, your evidence, your way of
thinking, your report and its presentation are unique and demonstrate your ability to create it all independently.
So if you work in a team compare your submission to those of your team members and make it quite distinct in
both contents and form. Any part of this report that bears any resemblance to another students’ report or any
information source written by others or by you for another unit (e.g. on the web) will be treated as plagiarism.
Total
1 of 5

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

MIS772 Predictive Analytics (2019 T1) Individual Assignment A2-LP3 / Workshops M1, M2T1
Executive summary
It has seen that the Personal and family unit income in Australia have seen a quickening in the course of
the most recent decade or thereabouts (Dhanalakshmi, Bino and Saravanan, 2016). Thusly consuming
meals way from home has expanded. Since wine is generally observed as an integral item 'such dinners
are bound to be joined by wine than suppers eaten at home' Another discernable ongoing pattern that is
likely animated by developing salaries has been a move among Australian wine purchasers toward
premium wines, far from the less expensive, lower quality item that quickened wine utilization.
Under such circumstances, Australian Wine Importers (AWI) is intended to import a new wine type into
the Australian market. However, they have planned to perform few analysis basis the information
available so that they can predict the market demand for specific wine. Understanding the demand trend
not only help to specify the import volumes, but also at the same time help them to set price level as well
as reduction of cost associated with this import.
As they asked to perform a detailed analysis, the analyst has considered the wine test results, with
respect to following variables:
 Wine “title” (name + vintage);
 Country, Province and Region;
 Variety and Winery;
 Description and Designation;
 Price (US$).
In specific, the analyst will apply test mining technique with the help of rapid miner tool. In order to do so,
the analyst has performed clustering and segmentation analysis basis the wine variety and winery
(Roiger, 2017).
2 of 5

MIS772 Predictive Analytics (2019 T1) Individual Assignment A2-LP3 / Workshops M1, M2T1
Data exploration and preparation in RapidMiner
In order to perform the study, the first step was downsizing sample data and replacing missing values. In
order to do so, rapid miner sample operator and replace missing value operator have been used.
Further, to select specific attributes, rapid miner select attributes operator has been used. The main
check is to see where esteems are feeling the loss of; the data in this record is generally finished.
Fortunately there isn't any information missing in the depiction and assortment segments, which are the
principle segments I requirement for this examination (Ristoski, Bizer, and Paulheim, 2015). There are 5
missing qualities for nation that I initially filled in by taking a gander at which winery was recorded with
these missing qualities, at that point looked into different sections with that equivalent winery to decide
nation. I needed to do this since I was considering investigating what words are utilized to portray wines
from different nations yet scratched this thought and chose to concentrate just on the distinctive
assortments.
Figure: Data exploration
Presently that the content pre-processing is done, I can at last apply the clustering technique to group
our decision variable. Here, for the clustering technique it needs to introduce k centroids with the goal for
it to begin finding the k groups (Naik, and Samant, 2016). There are a couple of ways to deal with how
the centroids are initialized yet it regularly includes a type of randomization. As a result of this
haphazardness, the calculation isn't deterministic implying that it may think of various bunches in the
event that we run it on different occasions. Accordingly, it is savvy to run the calculation on various
occasions and have it pick the adaptation that has the most minimal inside group change, which I won't
portray here.
3 of 5

MIS772 Predictive Analytics (2019 T1) Individual Assignment A2-LP3 / Workshops M1, M2T1
The above table has shown that Pinot Noir is the most similar wine category.
4 of 5

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

MIS772 Predictive Analytics (2019 T1) Individual Assignment A2-LP3 / Workshops M1, M2T1
Reference:
Dhanalakshmi, V., Bino, D. and Saravanan, A.M., 2016, March. Opinion mining from student feedback
data using supervised learning algorithms. In 2016 3rd MEC International Conference on Big Data and
Smart City (ICBDSC) (pp. 1-5). IEEE.
Naik, A. and Samant, L., 2016. Correlation review of classification algorithm using data mining tool:
WEKA, Rapidminer, Tanagra, Orange and Knime. Procedia Computer Science, 85, pp.662-668.
Ristoski, P., Bizer, C. and Paulheim, H., 2015. Mining the web of linked data with rapidminer. Web
Semantics: Science, Services and Agents on the World Wide Web, 35, pp.142-151.
Roiger, R.J., 2017. Data mining: a tutorial-based primer. Chapman and Hall/CRC.
5 of 5