Data Mining and Visualization for Business Intelligence Project

Verified

Added on  2019/11/26

|10
|836
|72
Project
AI Summary
Document Page
Data Mining and Visualization for Business Intelligence
Assignment
Student Id
[Pick the date]
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question 1
Dimension Reduction
Given data
(a) Principal Component Analysis (PCA) for the given data set has been done in excel by
using XLMiner and the final output is shown below:
1
Document Page
Based on the variance analysis, it can be said nearly 95% of the variances are captured by the top
six principal components out of eight components which are highlighted.
Therefore, the above component matrix would be reduced and would contain only first top 6
principal components. The reduced matrix is shown below:
2
Document Page
The below highlighted table shows the principal components and their statistical significant
feature derived on the basis of the table highlighted above and the respective largest values in
each of the columns.
3
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Further, after considering the variance table, it can be said the most pivotal features are of X2,
X6 and X7.
When significantly high variation is present in one of the variable with respect to the other
variables, then data normalization is needed.
It is apparent from the above table that in variance percentage the maximum variation is about to
27.16% for variable 1. This first principal component percentage variance is not having
significantly high value as compared with other variables and therefore, it would be fair to
conclude that total variance is resulting not only from one single variable but also from other
components. Hence, data normalization is not needed in this case (Grossmann & MA, 2015).
(b) The main advantages and disadvantages of using principal component analysis are
furnished below:
Advantages of PCA
This method is used to simplify the complex data set.
4
Document Page
In PCA the usage of most variance method would provide the levy to select the pivotal
dimensions out of multiple dimensions of the variables for the ease of analysis.
In PCA, the representation of the variable components is in the “Orthogonal Form” which is
easy to simplify and to interpret the result (Hofmann & Andrew, 2016).
Disadvantage of PCA
PCA method would only be considered when the variables are having linear relationship and
orthogonal projections. Further, if the variables are showing non-linear relationship then this
method would not be used and hence, other methods would be taken into account.
Identification of the variable with highest variance among the other variables would be the
problem, especially when blind source separation present.
PCA method would only be used for “Gaussian distribution” where mean and variance
would be used for distribution description and hence, it cannot be used for other statistical
distribution where mean and variance are not critical (Shmueli, et. al., 2016).
Question 2
Naïve Bayes Classifier
Total number of customers (data given) = 5000 customers
Partition of the data - Training: Validation = 60%: 40%
5
Document Page
Data is attached in the excel spreadsheet (Hofmann & Andrew, 2016).
(a) Pivot table
Column variable - Online
Row variable - CC (Credit card)
Secondary row variable - Loan (Personal loan)
(b) “Probability that a randomly selected customer who has both bank’s credit card and
online banking service would ready to accept the personal loan”
Number of customers with CC and online = 542
Number of customers with CC, loan and online = 56
Probability ¿ Number of customers withCC , loanonline
Number of customers withCConline
¿ 56
542=0.103
(c) Pivot tables and probability
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
First pivot table:
Column variable - Online
Row variable – Loan (Personal loan)
Second pivot table:
Column variable – CC (Credit card)
Row variable – Loan (Personal loan)
Proportion or probability is computed below based on the above highlighted pivot tables.
S. No. Proportion or probability
(i) P ( CC=1| Loan=1 ¿ ¿ 97
304 =0.319
7
Document Page
(ii) P ( Online=1|Loan=1 ¿ ¿ 179
304 =0.588
(iii) P ( Loan=1 ) ¿ 304
3000 =0.101
(iv) P ( CC=1|Loan=0¿ ¿ 814
2696 =0.301
(v) P ( Online=1|Loan=0 ¿ ¿ 1597
2696 =0.592
(vi) P ( Loan=0 ) ¿ 2696
3000 =0.898
(d) Based on the quantities determined in part (c), the value of Naïve Bayes Probability is
given below:
Naïve Bayes Probability
(d)
P ( Loan=1|CC=1 , Online=1 ¿
¿ 0.3190.5880.101
0.3190.5880.101+ ( 0.3010.5920.898 )
¿ 0.01894
0.01894+0.1600
¿ 0.0957
8
Document Page
(e) The best possible strategy for the bank customer to get the personal loan is not to have
credit card and not to use the online services extended by the bank because this is the
combination that tends to maximize the probability of obtaining a loan.
Reference
Grossmann, W., & MA, R.S. (2015) Fundamentals of Business Intelligence (5th ed.). New York:
Springer.
Hofmann, M. & Andrew Chisholm (2016) Text Mining and Visualization: Case Studies Using
Open-Source Tools (3rd ed.). Florida: CRC Press.
Shmueli, G., Bruce, C.P., Stephens, L.M., & Patel, R. N. (2016) Data Mining for Business
Analytics: Concepts, Techniques, and Applications with JMP Pro (2nd ed.). Sydney: John
Wiley & Sons.
9
chevron_up_icon
1 out of 10
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]