logo

Data Mining Assignment Solved

7 Pages957 Words208 Views
   

Added on  2019-10-31

Data Mining Assignment Solved

   Added on 2019-10-31

ShareRelated Documents
DATA MININGAssignment – IIStudent id[Pick the date]
Data Mining Assignment Solved_1
Question 1 “DIMENSION REDUCTION” a)The principal component analysis result is as highlighted below.Key observations from the above are summarized below.The first four principal components are able to explain 79.9% of the total variance andhence it would be prudent to ignore the remaining principal components.For the first principal component, the significant factors are x1 and x2 which are fixedcharge covering ratio and rate of return which indicate that this relates to the financialperformance of the utility company.The second principal component has significant factors in the form of x4 and x8 whichare annual load factor and fuel cost and thereby relate to the operational performance ofthe utility company.The third principal component has significant factors in the form of x3 and x7 which arecost per unit along with nuclear contribution and thereby relate to the cost of electricityproduction.1 | P a g e
Data Mining Assignment Solved_2
The fourth principal component has significant factors in the form of x1 and x3 which arefixed charge covering ratio and unit cost and hence related to the fixed cost structure ofthe utility company production.It also needs to be addressed as to whether the normalization of data must be done prior to PCA.This becomes a necessity when a particular variable on account of difference in scales tends tocontribute to a significant proportion of the total variance thereby overshadowing the importanceof the other variables involves. Hence, normalization is done so as to remove the scale effect.However, no such need arises in the given case, as the highest variance explained by a singlefactor is only 27% and thus scale is not a pivotal factor for the PCA in this case.(b) The list of advantages and disadvantages are highlighted below:Advantages: PCA is considered to be a simple mode of true eigenvector multivariate systembased analysis. PCA techniques main features are to perform the central three tasks i.e.dimension reduction, maximizing the variance and to show the principal componentorthogonally. PCA technique first maximizes the variances in p-dimensional space especiallyunder quadratic constraints. This result the reduction of large sized data set into smaller sizeddata. The structure of the given parameter data can easily be evaluated based on PCA.Visualization of the data is simple because each variable has its own axis which is distributedinto a high-dimensional cloud space. Disadvantages: At times, it has been found that after the reduction of large sized variables intofewer variables, the derived principal components would not align in the space especially in thedirection of where the variance has maximum value. Hence, it seems difficult to find thedirection of key principal components. Also, the distance function in case of PCA analysis isinvalid. Further, this method does not employ on the data set which comprises categoricalvariables. Also, analysis of non-linear relation between variables is not successful.Question 2 2 | P a g e
Data Mining Assignment Solved_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Assignment on Data Mining Business Case Analysis
|7
|724
|42

The Advantages of PCA Method in Data Mining - Desklib
|11
|1673
|84

ITC 516 Data Mining and Visualization for Business Intelligence
|7
|750
|162

Data Mining Business Case Analysis
|8
|1115
|66

ITC516 – Data Mining and Visualization for Business Intelligence
|12
|1608
|142

Data Mining Business Case Analysis Report
|9
|846
|160