logo

Data Mining and Visualization | Assessment

10 Pages1183 Words87 Views
   

Added on  2020-03-28

Data Mining and Visualization | Assessment

   Added on 2020-03-28

ShareRelated Documents
Data Mining and VisualizationAssessment Item – 2STUDENT ID[Pick the date]
Data Mining and Visualization | Assessment_1
Data Mining Question 1a)PCA Output (XL Miner)InterpretationThe variance matrix clearly highlights that considering the top four principal components wouldlead to the assurance than 80% of the total variance is accounted for while the remaining wouldsimply be labeled as noise (Medhi, 2001).Taking the principal control matrix into consideration and the respective eigen values, thesignificant features from the given eight features can be earmarked for comparison of US utilityfirms. The higher magnitude features are preferred and the same has been summarized below(Hofmann & Chisholm, 2016).1 | P a g e
Data Mining and Visualization | Assessment_2
Data Mining The requirement for using normalised data often arises in PCA but it is not imperative for thiscase. The prime reason for the same is that the total variance matrix does not seem to haveundergone any major distortion owing to differential scale of the underlying variables. In case ofany significant skewing of representation which significantly alters the significance of thevariables, then data normalisation becomes a necessity (Shumueli et. al., 2016).(b) The main advantages and disadvantages of PCA are shown below (Kudyba & Hoptroff,2012).AdvantagesPCA procedure is statistically useful tool which reduces the high dimensional componentsinto fewer dimensional components. This is also useful technique to examine the actual structure of the dataset. This is based on the max- variance process and hence, the reduce variables are the set ofcomponents which has significantly high variance contribution into total variance. Visualization is easy in PCA because the reduced components (principal component) hasseparate axis in the m or p dimensional space cloud and hence, easy to visualize. DisadvantagesIn the process of variable reduction, there is high possibility that the PCA may eliminate thevariables which are significant. This technique fails for the data set which has presence of categorical variables.This also not useful when the data variables are derived from the unknown sourcedistribution. This is not useful when non-linear association tends to exist between the variables.2 | P a g e
Data Mining and Visualization | Assessment_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Mining Business Case Analysis Report
|9
|846
|160

Data Mining and Visualization Assignment
|9
|1040
|50

Data Mining and Visualization - PDF
|9
|779
|313

Data Mining & Visualization for Business Intelligence
|8
|781
|55

Assignment Data Mining & Visualization for Business Intelligence
|7
|1204
|86

Assignment on Data Mining Business Case Analysis
|7
|724
|42