NIT3171 Business Analysis Case Study: Data Mining Techniques Report

Verified

Added on  2022/11/11

|3
|472
|87
Report
AI Summary
This report presents a business analysis case study based on the Boston housing dataset, focusing on data mining techniques and business analysis. The assignment aims to discover relationships among variables through normalization, including 1NF, 2NF, and 3NF, to avoid data redundancy and anomalies. The solution identifies candidate keys and non-prime attributes, and examines the correlation matrix to understand the strength of relationships between variables. The analysis reveals that variables like 'salesprice' and 'TotalBsmtsF' show a strong positive correlation. Furthermore, the report discusses the impact of different sales price conditions. This analysis is part of a group assignment for the NIT3171 ICT Business Analysis & Data Visualization course, fulfilling the first stage of a larger project. The report demonstrates the application of business analytical skills to leverage the given housing dataset effectively.
Document Page
Solutions for task 2
We attempt to discover the relationships among variables (attributes) using the following
suggestion;
Normalization
Normalization is the process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly and deletion anomaly. There are different types of
normalization namely;
First Normal Form
Second Normal Form
Third Normal Formal
In first normal form (1NF) we use the rule that each attribute of a table must have atomic
(single) values. You need to eliminate multiple values in a column by creating a separate
column for multiple values and identify each set of related data with a primary key. From
excel the columns provided doesn’t contain any multiple values hence the first normal
form for normalization is observed.
In the second normal form (2 NF); we can only proceed when the first normal form is
satisfied. No non-prime attribute is dependent on the proper subset of any candidate key
of the table. An attribute that is not part of any candidate key is called non- prime
attribute. Candidate key is the minimal set of attribute which can uniquely identify a row
in this case ID is our candidate key and its only one from our data. We have a table that
has a composite primary key i.e. year built, year RemodAdd. The non key attribute is
Roofstyle. In this case Roofstyle depends on yearRemodAdd which is part of the primary
key.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The third normal form (3NF) its table is in 3 NF if it is 2 NF and for each functional
dependency X->Y at least one of the following conditions hold;
X is a super key of table
Y is a prime attribute of table
In this table ID determines FullBath and FullBath determine Garage Types. Therefore ID
determines Garage Types via FullBath and we have transitive functional dependency
X→Z→Y where X being ID, Z being FullBath and Y is Garage Type
The correlation matrix obtain from excel shows the relationship between variables
(attributes) and the variables with negative correlation shows weak relationship between
variables and positive correlation shows that most variables are strongly related. Like in
this case the correlation between salesprice and TotalBsmtsF is 0.61 which is the highest
correlation showing that the two variables are strongly related.
From the excel output we see that most of the salesprice were highest by the abnormal,
followed by Adjland, then Alloca, then Family, then Normal and last but not least Partial.
Document Page
Reference
Media.lanecc.edu. (2019). [online] Available at:
https://media.lanecc.edu/users/loftl/CS275/ch06.pdf [Accessed 21 May 2019].
chevron_up_icon
1 out of 3
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]