NIT3171 ICT Business Analysis & Data Visualization: Case Study Report

Verified

Added on 2023/04/25

AI Summary

This assignment presents a business analysis case study using the Boston housing dataset and data mining techniques within the Weka tool. It involves understanding the dataset, discovering relationships among features through normalization, and identifying potential business analysis tasks such as ZeroR and Multi scheme classification. The analysis aims to provide business solutions for a real estate consulting firm, focusing on business benefits, process improvement, decision support, and strategy development. The classification algorithms are evaluated based on accuracy, error rates, and model building time, with a conclusion on the suitability of different algorithms for data classification. Desklib provides access to similar solved assignments and resources for students.

NIT3171 ICT Business Analysis & Data
Visualization Group Assignment
Business Analysis Case Study
Student Name: ****
Student ID: ****
1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Table of Contents
1 Project Description.............................................................................................................3
2 Task 1: Understand the dataset...........................................................................................3
3 Task 2: Relationships discovery among features................................................................7
4 Task 3: List any potential business analysis tasks with your justification..........................7
4.1 ZeroR Classification....................................................................................................8
4.2 Multi scheme Classification........................................................................................9
References................................................................................................................................12

1 Project Description
Main objective of this project is to use the Boston housing dataset to apply the data
mining techniques to resolve a business problem. Analysis the provided data set to provide
the suitable business solutions by using the Weka data mining tool. To analysis the provided
data by review the current, methodologies and algorithms for business analytics. These are
will be discussed and analysed in detail.
2 Task 1: Understand the dataset
Analysis the provided data set, first user needs to understand the data set. The provided
Boston housing dataset is described as below (Ahmadi & E Shiri Ahmad Abadi, 2013).
The provided dataset has following attributes such as,
 Id – It is used for data instances identifications.
 MS Sub Class – It is used to determines the dwelling types
 MS Zoning – It is used to determines the sales zoning classification.
 Lot Frontage: Linear feet of street connected to property
 Lot Area: Lot size in square feet
 Street: Type of road access to property
 Alley: Type of alley access to property
 Lot Shape: General shape of property

 Land Contour: Flatness of the property
 Utilities: Type of utilities available
 Lot Config: Lot configuration
 BsmtHalfBath
 FullBath
 HalfBath
 Bedroom
 Kitchen
 Kitchen Qual
 Land Slope: Slope of property
 Neighbourhood: Physical locations within Ames city limits
 Condition 1: Proximity to various conditions
 Year Built: Original construction date
 Year Remod Add: Remodel date
 Tot Rms Abv Grd
 Condition 2: Proximity to various conditions
 Bldg Type: Type of dwelling
 House Style: Dwelling Style
 Sale Type: Type of sale
 Sale Condition: Condition of sale
 Overall Qual: Percentages’ the overall material and finish of the house
 Overall Cond: Percentages’ the overall condition of the house
 Sale Price: Sale Amount and more.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Statistics data for provided dataset is shown below.
For ID attributes,
For Sale Conditions (Arabnia, Stahlbock, Abou-Nasr & Weiss, n.d.),

Visualization of provided data set is shown below.

3 Task 2: Relationships discovery among features
In this task, user needs to discover the relationships existed among all the attributes.
Here, we are applying the normalization techniques to discover the relationships among all
the attributes in the Boston Housing data. The normalization technique is used to remove the
duplicates in the data (Azzalini & Scarpa, 2012).
4 Task 3: List any potential business analysis tasks with your justification
In this task, user requires to list the potential business analysis for a provided data set.
Here, we are using the classification and prediction algorithm to resolve the business
problem. And, also provide the effective solutions for that problem. The effective results is
used to provides the following benefits for real estate consulting firm such as,
1. Business benefits
2. Improve the business process
3. Support decision making
4. Support strategy development.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4.1 ZeroR Classification
ZeroR is the most straightforward classification methods which depends on the
objective and predicts all Predictors .ZeroR classifier essentially predicts the category which
is class (Witten, Frank & Hall, 2011). Despite the fact that there is no consistency control in
ZeroR, it is helpful for deciding a standard execution as a benchmark for other classification
methods. Algorithm Construct a recurrence table for the objective and select it is most regular
value. Predictors Contribution There is not something to be said about the Predictors
commitment to the model on the grounds that ZeroR does not utilize any of them. Display
Evaluation the ZeroR just predicts the greater part class accurately. As referenced previously,
ZeroR is helpful for deciding a pattern execution for other classification methods. The ZeroR
classification is demonstrated as below (Han, Kamber & Pei, 2012).
=== Classifier model (full training set) ===
ZeroR predicts class value: 180921.19589041095
Time taken to build model: 0 seconds
=== Cross-validation ===
=== Summary ===

Correlation coefficient -0.0508
Mean absolute error 57444.7035
Root mean squared error 79439.3263
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 1460
The ZeroR algorithm predicts the mean Boston House class values is
180921.19589041095. it must achieve an RMSE better than this value. The ZeroR algorithm
predicts the tested negative value for all instances as it is the majority class, and achieves an
accuracy of 82 % (Kaluža, 2013).
4.2 Multi scheme Classification
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 1198 82.0548 %
Incorrectly Classified Instances 262 17.9452 %
Kappa statistic 0
Mean absolute error 0.1056

Root mean squared error 0.2289
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 1460
=== Detailed Accuracy By Class ===
TP Percentage FP Percentage Accuracy Recall F-Measure MCC ROC Area
PRC Area Class
1.000 1.000 0.821 1.000 0.901 ? 0.496 0.819 Normal
0.000 0.000 ? 0.000 ? ? 0.495 0.069 Abnorml
0.000 0.000 ? 0.000 ? ? 0.489 0.084 Partial
0.000 0.000 ? 0.000 ? ? 0.199 0.003 AdjLand
0.000 0.000 ? 0.000 ? ? 0.433 0.007 Alloca
0.000 0.000 ? 0.000 ? ? 0.500 0.014 Family
Weighted Avg. 0.821 0.821 ? 0.821 ? ? 0.494 0.685
=== Confusion Matrix ===
a b c d e f <-- classified as
1198 0 0 0 0 0 | a = Normal
101 0 0 0 0 0 | b = Abnorml
125 0 0 0 0 0 | c = Partial
4 0 0 0 0 0 | d = AdjLand
12 0 0 0 0 0 | e = Alloca
20 0 0 0 0 0 | f = Family
In light of the above tables and figures, we can obviously observe that for the Boston
Housing data most significant accuracy is 100% and the least is 17.94 %. The other algorithm
yields a normal accuracy of around 85%. In fact, the most important accuracy has a place
with the Multi scheme classifier. ZeroR Classifier present at the base of the outline with
percentage around 100%. A normal of 1198 instances out of absolute 1460 instances is
observed to be effectively characterized with most elevated score of 262 occurrences
contrasted with 1460 instances, which is the least score (Maimon & Rokach, 2010). The total
time required to build the model is likewise a basic parameter in contrasting the classification
algorithm. It is regular to recognize the reliability quality of the data gathered and their
legality. This analysis suggests a normally utilized pointer which is mean of supreme errors
and root mean squared errors. Then again, the relative errors are additionally utilized. It is

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

found that the most important error is found in ZeroR Classifier with a normal score of
around 0.821. A algorithm which has a lower error percentage will be favoured as it has all
the more powerful classification capability, so after investigation we can say that ZeroR
algorithm isn't appropriate for a Data since it has most extreme number of errors and can't
classify the data effectively (Olson, 2017).

References
Ahmadi, F., & E Shiri Ahmad Abadi, M. (2013). Data Mining in Teacher Evaluation System
using WEKA. International Journal Of Computer Applications, 63(10), 12-18. doi:
10.5120/10501-5268
Arabnia, H., Stahlbock, R., Abou-Nasr, M., & Weiss, G. DMIN 2017.
Azzalini, A., & Scarpa, B. (2012). Data Analysis and Data Mining. Oxford: Oxford
University Press, USA.
Han, J., Kamber, M., & Pei, J. (2012). Data mining. Waltham, MA: Morgan
Kaufmann/Elsevier.
Kaluža, B. (2013). Instant Weka how-to. Birmingham: Packt Pub.
Maimon, O., & Rokach, L. (2010). Data mining and knowledge discovery handbook. New
York: Springer.
Olson, D. (2017). Descriptive Data Mining. Singapore: Springer Singapore.
Witten, I., Frank, E., & Hall, M. (2011). Data mining. Burlington, Mass.: Morgan Kaufmann
Publishers.