Data Mining Report: Analysis of PCA, Naive Bayes, and Loan Acceptance
VerifiedAdded on 2020/03/28
|17
|1808
|62
Report
AI Summary
This report provides a comprehensive analysis of data mining techniques, focusing on dimensionality reduction, Principal Component Analysis (PCA), and Naive Bayes classification. It begins with an overview of PCA, detailing variable selection, component selection (fixed and smallest components), and the interpretation of results, including Eigenvalues, variance, and cumulative variance. The report also discusses the advantages and disadvantages of PCA. Subsequently, it delves into Naive Bayes classification, demonstrating pivot table creation in Excel and calculating probabilities related to loan acceptance based on credit card usage and online banking activity. The report presents various probability calculations, including conditional probabilities, to determine the best strategy for loan acceptance. The best strategy involves leveraging credit card usage and online transaction activity to identify potential loan recipients. The report concludes by highlighting the importance of customer interaction and awareness to enhance loan acceptance rates. References to relevant sources are included.

DATA MINING
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
1. Dimensionality Reduction..................................................................................................1
2. Principal Component Analysis...........................................................................................1
Advantages of Principal Component Analysis.......................................................................8
Disadvantages of Principal Component Analysis..................................................................9
3. Naive Base Classification...................................................................................................9
Pivot Table Creation...............................................................................................................9
Probability of Loan Acceptance...........................................................................................10
Probability of A given B.......................................................................................................11
4. Naive Bayes Probability...................................................................................................13
5. Best Possible Strategy to get the Loan..............................................................................13
References................................................................................................................................14
1
1. Dimensionality Reduction..................................................................................................1
2. Principal Component Analysis...........................................................................................1
Advantages of Principal Component Analysis.......................................................................8
Disadvantages of Principal Component Analysis..................................................................9
3. Naive Base Classification...................................................................................................9
Pivot Table Creation...............................................................................................................9
Probability of Loan Acceptance...........................................................................................10
Probability of A given B.......................................................................................................11
4. Naive Bayes Probability...................................................................................................13
5. Best Possible Strategy to get the Loan..............................................................................13
References................................................................................................................................14
1

1. Dimensionality Reduction
Excel Data for Analysis
The utility data is taken for Principal Component Analysis.
2. Principal Component Analysis
Selection of the Variables
To perform Principal Component Analysis, from the Data Analysis tab choose the
option Transform. From the transform option, selct Principal Components.
The first step in the Principal Component Analysis is the selection of the variables.
The window for the selection of the variables which is the first step in the Principal
Component Analysis is shown below.
In the worksheet option, the name of the worksheet is displayed. In the workbook
option, the name of the excel file is displayed. The range from which column to which
column with row number is specified.
Number of rows and number of columns is specified.
2
Excel Data for Analysis
The utility data is taken for Principal Component Analysis.
2. Principal Component Analysis
Selection of the Variables
To perform Principal Component Analysis, from the Data Analysis tab choose the
option Transform. From the transform option, selct Principal Components.
The first step in the Principal Component Analysis is the selection of the variables.
The window for the selection of the variables which is the first step in the Principal
Component Analysis is shown below.
In the worksheet option, the name of the worksheet is displayed. In the workbook
option, the name of the excel file is displayed. The range from which column to which
column with row number is specified.
Number of rows and number of columns is specified.
2
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

The descriptions of the variables from X1 to X8 is specified below.
3
3
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Component Selection
The principal Component Analysis provides two types of components known as Fixed
components and smallest components. Fixed components is selected. From the two provided
methods, Use Correlation Matrix with Standardized variables is selected.
Principal Component Score
The last step in the Principal Component Analysis is the selection Data
Transformation and Data Fault Detection.
In the data transformation tab, click the check box called Show principal component
score to finish the principal Component Analysis.
After selecting the required option, click on the Finish button.
Thus, the Principal Component Analysis steps are completed.
4
The principal Component Analysis provides two types of components known as Fixed
components and smallest components. Fixed components is selected. From the two provided
methods, Use Correlation Matrix with Standardized variables is selected.
Principal Component Score
The last step in the Principal Component Analysis is the selection Data
Transformation and Data Fault Detection.
In the data transformation tab, click the check box called Show principal component
score to finish the principal Component Analysis.
After selecting the required option, click on the Finish button.
Thus, the Principal Component Analysis steps are completed.
4

Results of the Principal Component Analysis
After completing the three steps of the Principal Component Analysis. Two worksheets
are shown one is the output sheet for Principal Component Analysis called as PCA_Output
and the another one is the Scores of the Principal Component Analysis named as
PCA_Scores. In the PCA_Output worksheet, inputs of the principal component analysis that
is being given is displayed. Below the input specification, the principal components are
specified. 8 components are specified for each variables from X1 to X8. Eigen value,
Variance in percentage and Cumulative variance in percentage are determined in the
Explained variance area. In the PCA scores worksheet, the scores are provided for all the
records of the worksheet with all the 8 components. The two worksheets that are determined
as the outputs for the Principal Component Analysis are represented below.
5
After completing the three steps of the Principal Component Analysis. Two worksheets
are shown one is the output sheet for Principal Component Analysis called as PCA_Output
and the another one is the Scores of the Principal Component Analysis named as
PCA_Scores. In the PCA_Output worksheet, inputs of the principal component analysis that
is being given is displayed. Below the input specification, the principal components are
specified. 8 components are specified for each variables from X1 to X8. Eigen value,
Variance in percentage and Cumulative variance in percentage are determined in the
Explained variance area. In the PCA scores worksheet, the scores are provided for all the
records of the worksheet with all the 8 components. The two worksheets that are determined
as the outputs for the Principal Component Analysis are represented below.
5
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

PCA_Output
PCA_Scores
6
PCA_Scores
6
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Different Component Selection
When the smallest component is selected with 60 % of the variance. The PCA_Output1
and PCA_Scores1 worksheet is displayed after finishing the steps of the PCA. So after
selecting the small components with variance 60 %, the PCA_Output1 and PCA_Scores1
worksheets display only 3 components which have the variance of the 60%. The Output
worksheets after performing the selection of smallest components with variance 60% is
represented below.
7
When the smallest component is selected with 60 % of the variance. The PCA_Output1
and PCA_Scores1 worksheet is displayed after finishing the steps of the PCA. So after
selecting the small components with variance 60 %, the PCA_Output1 and PCA_Scores1
worksheets display only 3 components which have the variance of the 60%. The Output
worksheets after performing the selection of smallest components with variance 60% is
represented below.
7

PCA_Output1
PCA_Scores1
8
PCA_Scores1
8
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

The data can be normalized. The attributes that are used in the table are Utility name and
Utility ID with serial numbers. So if making changes in appending new tables or deleting
some tables cannot cause redundancy. For example if the data table is normalized into two as
follows, then it will not affect any tables. The utility name and number into one table and the
utility number and Fixed Charge Covering ratio into another table. So if the second table does
not contain utility name is also not a problem as it contains the utility number. The key is the
Utility number to characterize the components.
9
Utility ID with serial numbers. So if making changes in appending new tables or deleting
some tables cannot cause redundancy. For example if the data table is normalized into two as
follows, then it will not affect any tables. The utility name and number into one table and the
utility number and Fixed Charge Covering ratio into another table. So if the second table does
not contain utility name is also not a problem as it contains the utility number. The key is the
Utility number to characterize the components.
9
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

10

Advantages of Principal Component Analysis
The Principal Component Analysis is capable of partitioning the data with variance
into a discrete image set.
This Analysis is considered to be the most effective one as compared to other data
analysis tools.
The PCA has the advantage to minimise the number of dimensions without much lose
in the information (Sang, Wang and Cao, 2017).
Disadvantages of Principal Component Analysis
The variables that has the high absolute values of variances do not dominate the first
principal component.
The PCA is not a sufficient analysis for the variables that are not linearly coordinated.
The mean and variance values that are produced by the PCA do not determine any
relative information of some distributions (Quora, 2017).
3. Naive Base Classification
Excel Data for Classification
Pivot Table Creation
To create a pivot table in the excel, select Insert tab and click on the pivot table option,
the create pivot table dialog box appears as below. The first option is to select a table or range
in which the pivot table is to be created. The created pivot table can be saved in a new
worksheet or in the existing worksheet. If the pivot table want be created in the existing
11
The Principal Component Analysis is capable of partitioning the data with variance
into a discrete image set.
This Analysis is considered to be the most effective one as compared to other data
analysis tools.
The PCA has the advantage to minimise the number of dimensions without much lose
in the information (Sang, Wang and Cao, 2017).
Disadvantages of Principal Component Analysis
The variables that has the high absolute values of variances do not dominate the first
principal component.
The PCA is not a sufficient analysis for the variables that are not linearly coordinated.
The mean and variance values that are produced by the PCA do not determine any
relative information of some distributions (Quora, 2017).
3. Naive Base Classification
Excel Data for Classification
Pivot Table Creation
To create a pivot table in the excel, select Insert tab and click on the pivot table option,
the create pivot table dialog box appears as below. The first option is to select a table or range
in which the pivot table is to be created. The created pivot table can be saved in a new
worksheet or in the existing worksheet. If the pivot table want be created in the existing
11
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 17
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2026 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.





