Ask a question to Desklib · AI bot

Added on -2019-09-22

Learn about conducting Principal Component Analysis (PCA) and Naive Bayes Classifier for Personal Loan Acceptance using UniversalBank.xls dataset. Understand the advantages and disadvantages of using PCA compared to other methods for this task. Get insights on the probability of loan acceptance conditional on having a bank credit card and being an active user of online banking services.

| 18 pages

| 2268 words

| 391 views

Trusted by 2+ million users,

1000+ happy students everyday

1.Dimension Reduction (5%)This item requires the dataset Utilities.xls which can be found on the subject Interact site.This dataset gives corporate data on 22 US public utilities. We are interested in forminggroups of similar utilities. The objects to be clustered are the utilities. There are 8 measurements on each utility described below. An example where clustering would be useful is a study to predict the cost impact of deregulation. To do the requisite analysis economists would need to build a detailed cost model of the various utilities. It would save a considerable amount of time and effort if we could cluster similar types of utilities and to build detailed cost models for just one ”typical” utility in each cluster and then scaling up from these models to estimate results for all utilities. The objects to be clustered are the utilities and there are 8 measurements on each utility.X1: Fixed-charge covering ratio (income/debt) X2: Rate of return on capital X3: Cost per KW capacity in place X4: Annual Load Factor X5: Peak KWH demand growth from 1974 to 1975 X6: Sales (KWH use per year) X7: Percent Nuclear X8: Total fuel costs (cents per KWH)

a. Conduct Principal Component Analysis (PCA) on the data. Evaluate and comment on the Results. Should the data be normalized? Discuss what characterizes the components you consider key and justify your answer.b. Briefly explain advantages and any disadvantages of using the PCA compared to other methods for this task.a)The Utility data set used in the problem provides data on 22 public utilities in the United States.Step 1: On the XLMiner ribbon select a cell from theData Analysistab from the utility dataset and ten choose Transform - Principal Componentsto open thePrincipal Components Analysis dialog.

Step 2: The Principal component analysis dialog opened select variables x1 to x8 from theVariables in Input Datalist and then click the ‘>’ button to move them to theSelectedVariableslist, and clickNext.From principal components numbers, XLMiner provides two routines called Fixed #components and Smallest # components explaining. To specify a fixed number of components or variables to be included in the reduced model use the first model that is Fixed # components method. The other method that is Smallest #components explaining method let the user control a percentage of the variance. If this method is selected,

XLMiner calculates the least number of principal components that can be used to accountfor the sepcified percentage of the variance.Also for calculating the principal components, XLMiner provides two methods: the covariance and the correlation matrix. When using the correlation matrix method, the data is normalized first before the method is applied (i.e., the data set is normalized by dividing each variable by its standard deviation). Normalizing gives all variables equal importance in terms of variability.If the covariance method is selected, the data set should first be normalized.Before continuing to the next step select Correlation Matrix then clickNext.

** You are reading a preview**

Upload your documents to download

or

Become a Desklib member to get accesss

Paid Plans

Free Plan

Single Unlock

Monthly Plan

Yearly Plan