logo

Data Mining Tools and Techniques

   

Added on  2021-06-14

19 Pages3731 Words50 Views
Data Mining (Tools and Mining Techniques)
Data Mining Tools and Techniques_1
Index1.Introduction...........................................................................................12.Data set Description................................................................................13.Preprocessing........................................................................................24.Important Attribute...................................................................................45.Importance of Attributes............................................................................56.Correlation and Regression Analysis...........................................................6 7.Association Algorithm...............................................................................78.Classification and Clustering Algorithm........................................................89.Result Analysis.......................................................................................910.Proposed Framework............................................................................1111.Conclusion..........................................................................................1412.References..........................................................................................15
Data Mining Tools and Techniques_2
Introduction Data Mining is the process or techniques to identify informative data from the data warehouse. Big Data is the coming from different source and collected in different systems. The main role is to predict the outcomes from the dataset, it also the motive for everyone for their business and market standard. There is huge amount of data available in the Data Science Industry. But this data is of no use until unless we can not find out the informative information from the data. It is necessary to analyze the bulk amount of data and extract useful infromation from it. Data Mining is defined the term as extracting information from bulk amount of data. It is theprocess of mining the knowledge from the huge data source.Chronic Kidney Disease dataset is collected through UCI machine learning repository. The main aim to identify the informative variables or fields from the dataset (S.J. and J.H., 2015).1. Dataset DescriptionChronic Kidney Disease dataset have 25 column. Below is the image of CKD data set with variables name. In the image two types of data set we have : one is numeric and second is categorical data. In numeric data we will consider id, age, bp, sg,al,su, bgr, bu, sc, sod, pot, pcv, wc and rc. Whereas in categorical data we will have rbc, pc, pcc,ba, htn, dm, cad, appet,pe and ane.idagebpsgalsurbcpcpccbabgrbuscsodpothemopcvwcrchtndmcadappetpeaneclassification048801.0210normalnotpresentnotpresent121361.215.44478005.2yesyesnogoodnonockd17501.0240normalnotpresentnotpresent180.811.3386000nononogoodnonockd262801.0123normalnormalnotpresentnotpresent423531.89.6317500noyesnopoornoyesckd348701.00540normalabnormalpresentnotpresent117563.81112.511.23267003.9yesnonopooryesyesckd451801.0120normalnormalnotpresentnotpresent106261.411.63573004.6nononogoodnonockd560901.01530notpresentnotpresent74251.11423.212.23978004.4yesyesnogoodyesnockd668701.0100normalnotpresentnotpresent1005424104412.436nononogoodnonockd7241.01524normalabnormalnotpresentnotpresent410311.112.44469005noyesnogoodyesnockd8521001.01530normalabnormalpresentnotpresent138601.910.83396004yesyesnogoodnoyesckd953901.0220abnormalabnormalpresentnotpresent701077.21143.79.529121003.7yesyesnopoornoyesckd1050601.0124abnormalpresentnotpresent4905549.428yesyesnogoodnoyesckd1163701.0130abnormalabnormalpresentnotpresent380602.71314.210.83245003.8yesyesnopooryesnockd1268701.01531normalpresentnotpresent208722.11385.89.728122003.4yesyesyespooryesnockd136870notpresentnotpresent98864.61353.49.8yesyesyespooryesnockd1468801.0132normalabnormalpresentpresent157904.11306.45.616110002.6yesyesyespooryesnockd1540801.01530normalnotpresentnotpresent761629.61414.97.62438002.8yesnonogoodnoyesckd1647701.01520normalnotpresentnotpresent99462.21384.112.6nononogoodnonockd174780notpresentnotpresent114875.21393.712.1yesnonopoornonockd18601001.02503normalnotpresentnotpresent263271.31354.312.737114004.3yesyesyesgoodnonockd
Data Mining Tools and Techniques_3
Dataset Information As per the dataset we have 24 health related attributes taken in of 400 patients. Out of 400 patients there is 158 patients having complete records but remaining 242 patients having missing values in the dataset. age -agebp – blood pressuresg – specific gravityal – albuminsu – sugarrbc – red blood cellspc – pus cellpcc – pus cell clumpsba – bacteriabgr – blood clucose randombu – blood ureasc – serum creatinine sod – sodium pot – potesiumhemo – hemoglobinpcv – packed cell volume wc – white blood cell count htn – hypertensiondm – diabetes mellitus cad – coronary artery disease appet – apptitepe – pedal edema ane – anemia
Data Mining Tools and Techniques_4

End of preview

Want to access all the pages? Upload your documents or become a member.