logo

The Advantages of PCA Method in Data Mining - Desklib

11 Pages1673 Words84 Views
   

Added on  2020-04-01

The Advantages of PCA Method in Data Mining - Desklib

   Added on 2020-04-01

ShareRelated Documents
DATA MINING AND VISUALISATION FORBUSINESS INTELLIGENCEBUSINESS CASE ANALYSIS – ISTUDENT ID[Pick the date]
The Advantages of PCA Method in Data Mining - Desklib_1
Question 1(a)PCA Output without normalisation The PCA analysis is essentially based on the above output. The following steps need to beperformed in this regards.Step 1: Determine the extent of variance that needs to be accounted for. This information isneeded so as to identify the principal components which need to be analysed further. Forinstance, a limit of 97% variance would involve the first seven principal components while alower limit of 80% would restrict it to four seven principal components.Step 2: Based on the principal components identified above, the next step is to carry outidentification of features concerning the utilities which are essential for each of these. In thisendeavor, only the magnitude should be considered and signs ignored. For instance , in case of1
The Advantages of PCA Method in Data Mining - Desklib_2
principal component 1, the highest coefficient magnitudes are for two features namely x1 and x2.This implies that these features are significant in relation to accounting for first principalcomponent. Similarly, this process needs to be extends for the other principal components aswell.Step 3: Based on the key features identified, then further summary of critical features can bemade based on their relative importance indicated in the Principal Component matrix. X1 and X3emerge to be the most significant features based on which comparison between utility firms maybe carried aheadNormalisationSometimes before conducting PCA, data normalisation is done so as to eliminate the impact ofthe different scale used in variables. If not rectified, the total variance contribution of a variablehaving high scale can be significantly overrepresented thus reflecting the higher importance.However, for the given case, PCA with data normalisation was carried and no significantdifference could be notices and thereby it would be opportune for the given case to be conductedwithout normalisation.(b) The advantages of PCA method in data mining and visualization are as illustrated below:1.The result is generated in the form of orthogonal matrix and therefore, the comparison andanalysis is convenient.2.PCA is considered a “Max- Variance” technique and hence, it eliminates the variables whichare noise variable (low variance) for the analysis. 2
The Advantages of PCA Method in Data Mining - Desklib_3
3.It also decreases the risk that may be generated in large set data in the analysis of over-fittingof data. 4.The visualization of the result of PCA is quite easy because each respective principalcomponent had its unique axis which is exactly at the right angle of another principalcomponent’s axis. 5.In dimensional reduction process, there is a possibility that the data may get over-fitted. Thisrisk of data over-fitting can be reduced by a good number by applying the PCA method. 6.This method is best suitable to examine and visualize the underlying structure. 7.Visualization is done either in m- dimensional space or in p –dimensional space which can beeasily be visualized in the cloud form.8.Variables with linear and logical relations can easily be analyzed by deployment of PCAmethod. The disadvantages of PCA method in data mining and visualization are as illustratedbelow:1.For dataset which has resulted from blind resource with undefined mean and variance, thenPCA cannot be employed for data reduction. 2. When the dataset has any variable of categorical type, then PCA method cannot be applied. 3.Computation of max variance component in the long complex orthogonal matrix is difficulttask and time taking. 4.Further, the covariance matrix can be complex when the key significant patterns are high andvariance difference is very less. 3
The Advantages of PCA Method in Data Mining - Desklib_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Report on Data Mining and Visualization for Business Intelligence
|10
|1048
|94

Data Mining and Visualization | Assessment
|10
|1183
|87

Data Mining and Visualization Assignment
|9
|1040
|50

Data Mining and Visualization Assessment | Study
|8
|1179
|111

Data Mining & Visualization for Business Intelligence
|8
|781
|55

ITC516 – Data Mining and Visualization for Business Intelligence
|12
|1608
|142