Business Case Analysis on PCA for Utility Data, Assignment-II

Verified

Added on  2020/04/07

|7
|724
|42
Homework Assignment
AI Summary
The homework assignment focuses on Principal Component Analysis (PCA) applied to utility data within a business case context. It starts by interpreting the results from a PCA run using the XLMiner Analytical Tool, discussing which components correspond to different aspects of utility performance like financial stage, operational performance, and production costs. The necessity for data normalization is also addressed, with justification based on variance contributions of variables. Part B explores both advantages and disadvantages of PCA, such as its effectiveness in linear relationships and challenges with non-linear ones or categorical data. Additionally, the assignment delves into probability calculations using a pivot table to determine customer likelihoods related to Universal Bank's credit card and loan services, employing Naïve Bayes Probability for further insights.
Document Page
Data Mining
Business Case Analysis, Assignment -II
Student Id/Name
[Pick the date]
Question 1
PART A
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data Mining
The PCA of the utilities data is competed using XLMiner Analytical Tool and is shown below:
Interpretation of PCA result
From variance percent table shown above, it would be fair to say that nearly 80% of
variance is discussed by the initial four principal components. Therefore, the main focus of
PCA would be based on these four principal components.
The 1st principal component would be x1 and x2 which shows the “utility financial stage.”
The 2nd principal component would be x4 and x8 which shows the “utility operational
performance.”
The 3rd principal component would be x3 and x7 which shows the “utility production cost of
electricity.”
The 4th principal component would be x1 and x3 which shows the “utility fixed cost of
electricity.”
1
Document Page
Data Mining
Requirement of Normalization
This is an essential step done prior to principal component analysis only when the data variances
are not magnified by the scale advantage that one or more variables may enjoy over others. This
means when few of the variables capture the total variance contribution owing to their advantage
of scale then in order to ensure that effect of the scale is nullified, PCA data normalization is
required. For the given data set, each of the variable’s contribution in the total magnitude of
variance is significant and hence, it “normalization of data is not required.”
PART B
Advantages of PCA: Most appropriate technique when the PCA is run for the variables which
are associated with each other through linear relations. The PCA technique provides well-defined
m or p dimensional space for the distribution of principal components cloud point which exhibits
separate axis for each of the principal component. This helps to reduce the risk of over-fitting of
components especially when there is presence of large sized data points.
Disadvantages of PCA: This technique fails to analyze the dataset in which the data variables
are associated with each other through non-linear relations. Further, this technique cannot be of
use when the dataset contains any categorical variable. Further, it can directly provide the
magnitude of the principal components but it is difficult to understand the exact direction of
principal component in the cloud. In the process of max-variance of the variables, the reduction
2
Document Page
Data Mining
may cause elimination of the important variance which has significant role in the data set. This is
considered the major drawback in PCA technique.
Question 2
Sample size (Universal Bank customers) = 500
Partition = 60% Training; 40% Validation (Through XLMiner Analytical Tool)
With the help of standard partition of XLMiner, the partition has been done and the output looks
as highlighted below:
PART A
Pivot table: Online (Column), CC (first row), Loan (Second row)
3
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data Mining
**Yes = 1 (Customer is using the credit card/online service/loan)
No = 0 (Customer is not using the credit card/online service/loan)
PART B
Probability
The required values are highlighted in the above shown pivot table, i.e.
( CC =1|Loan=1 ,Online=1 ¿=51
( CC =1|Online=1 ) =522
P = 51 / 522 = 0.0978
9.78% is the probability which highlights the fact that a selected customer who already owns
credit card of universal bank and is an active user of online service would also be ready to accept
the personal loan offer of universal bank.
PART C
Pivot table
1. Online (column), Loan (row)
4
Document Page
Data Mining
2. CC (column), Loan (row)
For the computation of required proportion/probability the above highlighted values would be
used.
5
Document Page
Data Mining
(d) Naïve Bayes Probability P (Loan=1 | CC=1, Online =1)
= ( 0.3050.6010.101 )
( 0.3050.6010.101 ) + ( 0.2960.5880.898 ) =0.1061
Thus, based on the above computed quantities, the value of Naïve Bayes Probability is
10.61%.
(e) The optimal strategy for the customer to obtain loan would involve that the given customer
should have the credit card of the bank already issued and simultaneously should be an active
consumer of the bank’s online services.
6
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]