logo

Data Mining and Visualization Assignment

9 Pages1040 Words50 Views
   

Added on  2020-03-28

Data Mining and Visualization Assignment

   Added on 2020-03-28

ShareRelated Documents
Data Mining Data Mining and VisualizationAssignmentStudent id and Name[Pick the date]
Data Mining and Visualization Assignment_1
Question 1(a)Output obtained from XL Miner (PCA)Interpretation and Identification Two critical aspects of the output are outlined below.Variance Matrix – It highlights the contribution of each of the principal components to the totalvariance. This is helpful and forms the basis of PCA whereby the higher variance would beproduced by the components that are considered the most critical from a statistical point of view.For instance, PC1, PC2, PC3 and PC4 would be significant for the utilities data (Shumeli, Bruce,& Patel, 2016).Principal Component Matrix – This links the principal components with the features ofsignificance. With regards to each principal component, the underlying feature signiifance wouldbe outlined by the magnituede of the coefficient in the matrix. As an example, for PC4 , the twomost imperative factors would be x1 and x3 (Camm, et. al., 2016).1
Data Mining and Visualization Assignment_2
Normalisation of DataUsually data normalisation is required to prepare the data for PCA as the scales of representationof the data tend to be different which leads to a unfair advantage to those variables which have ahigh variance owing to the greater magnitude of the values. Scaling tends to nullify thisadvantage. This is not required here as the variance contribution of any one princiapl componentdoes not exceed 28%. Further, output from normalised data in PCA also does not bring inimprovement which reflect non-requirement of data normalisation (John, 2014).(b) Advantages of PCA Effective data Sturcture determination techniqueDimensional reduction method: Reduce n-number data variables into fewer dimensional dataMax variance method: Maximize the variance of variables by removing the noise variable (variable with low varianceOutcome is in orthogonal covariance matrix form and hence, suitable for understanding and determination of magnitude (Ragsdale. 2016)Disadvantages of PCA Not suitable method to examine the real direction of principal component matrix because thevariables are positioned into the cloud space as a dot product and therefore, difficult to examine. Not appropriate method where the data also comprises categorical dataThis method fails in demension redcution where the distribution of data is not supported with mean and variance values. The graphical result may get complex because all the principal components exhibit their personal axis. Not suitable when the variables with non- linear relationship will be taken into account (Han, Pei & Kamber, 2011).2
Data Mining and Visualization Assignment_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Mining Business Case Analysis Report
|9
|846
|160

Data Mining and Visualization | Assessment
|10
|1183
|87

Data Mining - Business Case Analysis Assignment
|7
|841
|187

Report on Data Mining and Visualization for Business Intelligence
|10
|1048
|94

Data Mining and Visualization Assessment | Study
|8
|1179
|111

Assignment on Data Mining Business Case Analysis
|7
|724
|42