MIT4204 Data Mining Assignment Solutions: Analysis and Algorithms

Verified

Added on  2023/04/24

|4
|788
|329
Homework Assignment
AI Summary
This document presents comprehensive solutions to a data mining assignment. The solutions cover a range of topics, including the construction and simplification of decision trees for parity functions with Boolean attributes, Gini index calculations for customer data, and the identification of frequent itemsets. Additionally, the document includes solutions for cluster analysis, types of clusters, and cluster analysis algorithms. The assignment also involves the analysis of transaction data and the application of various data mining techniques. This resource is designed to aid students in understanding and solving data mining problems effectively.
Document Page
Solution 1
The tree diagram cannot be simplified any further.
Solution 2
a) Gini index overall collection = 1 − 2 × 0.52 = 0.5.
(b) Gini index Customer ID attribute
The gini for each Customer ID value is 0. Therefore, the overall gini for Customer ID is 0.
(c) Gini index Gender attribute
Gini for Male is 1 − 2 × 0.52 = 0.5.
Gini for Female is also 0.5.
overall Gini for Gender =0.5 × 0.5+ 0.5 × 0.5 = 0.5.
d) Gini index Car Type
Gini Family car is 0.375, Sports car is 0, and Luxury car is
0.2188. The overall gini is 0.1625.
e) Gini index Shirt Size
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Gini for Small shirt size is 0.48, Medium shirt size is 0.4898, Large shirt size is 0.5, and Extra Large shirt
size is 0.5. The overall Gini for Shirt Size = 0.4914
(f) Car Type because it has the lowest gini among the three.
(g) Customer ID should not be used as the attribute test because it has no predictive power since new
customers are assigned to new Customer IDs.
Solution 3
P (A = 1|−) = 2/5 = 0.4, P (B = 1|−) = 2/5 = 0.4,
P (C = 1|−) = 1, P (A = 0|−) = 3/5 = 0.6,
P (B = 0|−) = 3/5 = 0.6,P (C = 0|−) = 0; P (A = 1|+) = 3/5 = 0.6,
P (B = 1|+) = 1/5 = 0.2, P (C = 1|+) = 2/5 = 0.4, P (A = 0|+) = 2/5
= 0.4, P (B = 0|+) = 4/5 = 0.8, P (C = 0|+) = 3/5 = 0.6.
Solution 4
a) Four, this is because the longest transaction contains 4 items, the maximum size of frequent item set
is 4
b (Bread, Butter)
c) (Bread, Butter) or (Beer, Cookies)
Solution 5
(a)
{1, 2, 3, 4},{1, 2, 3, 5},{1, 2, 3, 6}.
{1, 2, 4, 5},{1, 2, 4, 6},{1, 2, 5, 6}.
{1, 3, 4, 5},{1, 3, 4, 6},{2, 3, 4, 5}.
{2, 3, 4, 6},{2, 3, 5, 6}.
(b) {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {2, 3, 4, 5}, {2, 3, 4, 6}.
(c) {1, 2, 3, 4}
Solution 6
Document Page
a) This is not data mining –it is a simple database query.
b) This is not data mining – it is an accounting calculation
c) This is not data mining –it is simple accounting.
d) This is not data mining –this is a simple database query.
e) Since die is fair this is not data mining, it is a probability calculation
f) Yes- this is predictive modelling through use of models
g) Yes, this is classification
h) Signal processing is not data mining.
Solution 7
What is cluster analysis
Cluster analysis divides data into groups (clusters) that are meaningful, useful,
or both.
Describe types of Cluster Analysis
Hierarchical clustering
clusters with sub clusters
Partitional clustering
A division of the set of data objects into non-overlapping subsets (clusters) such that each data object is
in exactly one subset.
Complete clustering
A complete clustering assigns every object to a cluster
Document Page
Partial clustering
partial clustering does not assigns every object to a cluster
List different types of clusters
Well-separated clusters
Center-based clusters
Contiguity-based clusters
Density-based clusters
Conceptual clusters
List Cluster Analysis algorithms.
K Means clustering
Hierarchical clustering
Connectivity models
Centroid models
Distribution models
chevron_up_icon
1 out of 4
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]