Trusted by +2 million users,
1000+ happy students everyday
1000+ happy students everyday
Showing pages 1 to 4 of 18 pages
1.Dimension Reduction (5%)This item requires the dataset Utilities.xls which can be found on the subject Interactsite.This dataset gives corporate data on 22 US public utilities. We are interested in forminggroups of similar utilities. The objects to be clustered are the utilities.There are 8 measurements on each utility described below. An example whereclustering would be useful is a study to predict the cost impact of deregulation. To dothe requisite analysis economists would need to build a detailed cost model of thevarious utilities.It would save a considerable amount of time and effort if we could cluster similar typesof utilities and to build detailed cost models for just one ”typical” utility in each clusterand then scaling up from these models to estimate results for all utilities. The objects tobe clustered are the utilities and there are 8 measurements on each utility.X1: Fixed-charge covering ratio (income/debt)X2: Rate of return on capitalX3: Cost per KW capacity in placeX4: Annual Load FactorX5: Peak KWH demand growth from 1974 to 1975X6: Sales (KWH use per year)X7: Percent NuclearX8: Total fuel costs (cents per KWH)
a. Conduct Principal Component Analysis (PCA) on the data. Evaluate and commenton the Results. Should the data be normalized? Discuss what characterizes thecomponents you consider key and justify your answer.b. Briefly explain advantages and any disadvantages of using the PCA compared toother methods for this task.a)The Utility data set used in the problem provides data on 22 public utilities in the UnitedStates.Step 1: On the XLMiner ribbon select a cell from theData Analysistab from the utilitydataset and ten choose Transform - Principal Componentsto open thePrincipalComponents Analysis dialog.
Step 2: The Principal component analysis dialog opened select variables x1 to x8 fromtheVariables in Input Datalist and then click the ‘>’ button to move them to theSelectedVariableslist, and clickNext.From principal components numbers, XLMiner provides two routines called Fixed#components and Smallest # components explaining. To specify a fixed number ofcomponents or variables to be included in the reduced model use the first model that isFixed # components method. The other method that is Smallest #components explainingmethod let the user control a percentage of the variance. If this method is selected,
XLMiner calculates the least number of principal components that can be used to accountfor the sepcified percentage of the variance.Also for calculating the principal components, XLMiner provides two methods: thecovariance and the correlation matrix. When using the correlation matrix method, thedata is normalized first before the method is applied (i.e., the data set is normalized bydividing each variable by its standard deviation). Normalizing gives all variables equalimportance in terms of variability.If the covariance method is selected, the data setshould first be normalized.Before continuing to the next step select Correlation Matrix then clickNext.