logo

COMP3340 - Data Mining Assessment

8 Pages787 Words58 Views
   

The University of New Castle

   

Data Mining (COMP3340)

   

Added on  2020-03-13

COMP3340 - Data Mining Assessment

   

The University of New Castle

   

Data Mining (COMP3340)

   Added on 2020-03-13

ShareRelated Documents
DATA MINING Assessment Item 2 Student id [Pick the date]
COMP3340 - Data Mining Assessment_1
Question 1(a)The principal components obtained through PCA analysis are highlighted below (Liebowitz ,2013).However, the variances analysis suggests that 95% of the variances are explained by top 6principal components and hence the above matrix may be reduced to only the first six principalcomponents.The respective principal components and the most significant features are outlined below.Principal component 1 = x2Principal component 2 = x6Principal component 3 = x7Principal component 4 = x3Principal component 5 = x4Principal component 6 = x51
COMP3340 - Data Mining Assessment_2
Considering the contribution to variances, it may be concluded that the most critical features arex2, x6 and x7. Data normalization is required when a large amount of variation is seen in a particular variable asa result of which it accounts for a very high amount of variation in the PCA variance analysis.The relevant variance analysis is highlighted below (Liebowitz , 2013).It is evident that in the given utilities data, the first component account for only 27.16% of thetotal variance and therefore variance contribution comes from the various components ratherthan being contributed overwhelmingly from the first component only. Hence, normalization isnot required in the given data. (b) The various advantages of PCA over other methods are highlighted below.It helps in reducing a complex set of data and the multiple dimensions to only a selectfew dimensions that are considered the most pivotal based on the most variance method.Also, there is reduction in the risk related to over-fitting along with bringing down thecomplexity of the computation.The representation of components in the orthogonal form is also advantageous andsimplifies the interpretation.The various limitations or disadvantages of PCA in comparison with other available tools arehighlighted below (Shumeli, Bruce & Patel, 2016).It is based on the assumption of linear interrelationship between the variables throughorthogonal projections. However, if the relationships between the variables are non-linear, then PCA tends to be not a helpful technique and other methods need to bedeployed.2
COMP3340 - Data Mining Assessment_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Mining and Visualization for Business Intelligence- Assignment
|10
|836
|72

Data Mining and Visualization - Assignment
|10
|789
|188

Data Mining and Visualization
|9
|716
|138

Data Mining Assignment Sample
|8
|804
|312

Data Mining & Visualization
|10
|985
|56

Data Mining Business Case Analysis Report
|9
|846
|160