Data Mining Assignment Solved

Added on - 31 Oct 2019

  • 7

    Pages

  • 957

    Words

  • 101

    Views

  • 0

    Downloads

Trusted by +2 million users,
1000+ happy students everyday
Showing pages 1 to 3 of 7 pages
DATA MININGAssignment – IIStudent id[Pick the date]
Question 1“DIMENSION REDUCTION”a)The principal component analysis result is as highlighted below.Key observations from the above are summarized below.The first four principal components are able to explain 79.9% of the total variance andhence it would be prudent to ignore the remaining principal components.For the first principal component, the significant factors are x1 and x2 which are fixedcharge covering ratio and rate of return which indicate that this relates to the financialperformance of the utility company.The second principal component has significant factors in the form of x4 and x8 whichare annual load factor and fuel cost and thereby relate to the operational performance ofthe utility company.The third principal component has significant factors in the form of x3 and x7 which arecost per unit along with nuclear contribution and thereby relate to the cost of electricityproduction.1|P a g e
The fourth principal component has significant factors in the form of x1 and x3 which arefixed charge covering ratio and unit cost and hence related to the fixed cost structure ofthe utility company production.It also needs to be addressed as to whether the normalization of data must be done prior to PCA.This becomes a necessity when a particular variable on account of difference in scales tends tocontribute to a significant proportion of the total variance thereby overshadowing the importanceof the other variables involves. Hence, normalization is done so as to remove the scale effect.However, no such need arises in the given case, as the highest variance explained by a singlefactor is only 27% and thus scale is not a pivotal factor for the PCA in this case.(b) The list of advantages and disadvantages are highlighted below:Advantages: PCA is considered to be a simple mode of true eigenvector multivariate systembased analysis. PCA techniques main features are to perform the central three tasks i.e.dimension reduction, maximizing the variance and to show the principal componentorthogonally. PCA technique first maximizes the variances in p-dimensional space especiallyunder quadratic constraints. This result the reduction of large sized data set into smaller sizeddata. The structure of the given parameter data can easily be evaluated based on PCA.Visualization of the data is simple because each variable has its own axis which is distributedinto a high-dimensional cloud space.Disadvantages: At times, it has been found that after the reduction of large sized variables intofewer variables, the derived principal components would not align in the space especially in thedirection of where the variance has maximum value. Hence, it seems difficult to find thedirection of key principal components. Also, the distance function in case of PCA analysis isinvalid. Further, this method does not employ on the data set which comprises categoricalvariables. Also, analysis of non-linear relation between variables is not successful.Question 22|P a g e
desklib-logo
You’re reading a preview
Preview Documents

To View Complete Document

Click the button to download
Subscribe to our plans

Download This Document