Data Mining Techniques for Prediction of Employees' Performance

Verified

Added on  2022/11/17

|20
|4116
|311
AI Summary
This report discusses the use of data mining techniques for the prediction of employees' performance in an organization. It covers decision tree approach, CRISP-DM model, classification technique, association technique, and more. The report emphasizes the importance of data pre-processing and preparation by clustering of the dataset. It also includes a table of variables related to employees' performance.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: DATA MINING 1
Data mining
Name
Institution
Professor
Course
Date

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
DATA MINING 2
Table of Contents
1.0 Executive summary 2
2.0 Introduction 3
3.0 Decision Tree approach in prediction of the employees’ performance 4
4.0 The CRISP-DM model in prediction of the employees’ performance 5
5.0 Data mining techniques 6
5.1 Data pre-processing and preparation by clustering of the dataset 7
6.0 Classification technique 8
7.0 Association technique 9
8.0 The process of data mining 10
8.1 Modeling and experiments in CIS4015-N: Data Mining and CIS4035-N:
Machine Learning 11
9.0 conclusion 12
10.0 references 13
1.0 Executive summary
Document Page
DATA MINING 3
The prediction of the employee’s performance is an essential requirement in any
organization. The performance of the employees is determined by various factors such as social,
personal, dependability and environmental factors among others. Data mining is one of the tools
used in the determination of the performance of the employees in an organization. Data mining
techniques are used in discovering the hidden information and dataset patterns. Besides, it is used
in the determination of the relationship between the large volumes of data during the decision-
making process in an organization. The report indicates that a single data contains a lot of the
datasets and information required in determination of employees’ performance. The type of
information used in evaluation of employees’ performance in an organization is determined by
the datasets used. It is significant in deciding which data processing method will be used in the
report.
2.0 Introduction
The performance of the employees is determined by the monitoring of the organizational
outcomes and evaluation of employees’ datasets. The data mining technique is a combination of
the machine learning, visualization techniques and statistics in discovering new knowledge
regarding the datasets. The retention of the employees is an indication of organizational
enrollment and performance. By use of the data mining techniques, the problems in an
organization will be identified in advance to avoid major damages (Valle, Varas and Ruz 2012
pp.9939). The raw data will be pre-processed in this report by filling up the values which are
missing, the transformation of the benefits which are given in the form and attribution of the
relevant variables. In this case, one of the used methods is Decision Tree technique (Huang, Tsou
and Lee 2016 pp.396). The report involves classification of the datasets into groups of classes
pre-defined. In other words, this is defined as supervised machine learning because the
Document Page
DATA MINING 4
performance of the employees is determined through an examination of the data available in an
organization regarding their performance. In the process of improving the way employees
perform in an organization, the managers will have to monitor the daily performance of those
employees (Shazmeen, Baig and Pawar 2013 pp.1). In this process, data mining process will be
used in the prediction of the employees’ performance. Some of the questions that will be
answered based on the provided data in the report include the following.
1. What is the generation of the predictive variable data sources?
2. What are the different factors which affect the performance of the employees in an
organization?
3. How the Decision Tree model is constructed by use of the classified data mining techniques
in regard to the identified variables predicted with their values?
4. How is the dataset of the predictive valuables gathered?
5. What is the relationship between the factors affecting the efficiency of the model used?
The chosen analytics approaches suitable for this analysis is Decision Tree approach on the
performance of the employees and the CRISP-DM method (Sung, Chang and Lee 2019 pp.63).
The approaches used in this report are used to answer the proposed questions with valid
adaptation and clear justifications.
3.0 Decision Tree approach in prediction of the employees’ performance
The Decision Tree approach is a tree-like model or graph used for making the decisions
and the possible consequences in an organization. It includes the chances of the outcomes, utility
and resources costs. It is one of the ways of applying the algorithm in data mining technique

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
DATA MINING 5
(COE 2012 pp.201). Decision Tree approach is commonly used in operation research especially
when there is a significant decision to be made. It helps in identifying the strategy for the
purpose of reaching the goals of an organization. The process of constructing the Decision Tree
is not complicated because it requires selection of the variables for training such as nodes and
variable branches. The data is generally created into different sub-set of datasets (Thakar 2015
pp.5176). In each of the branch, training of the sample is done to correspond to each subset of
the branch of the parent node. When the nodes of all the chapters are classified, the attributes
remaining are used for further sub-divisions based on characteristics of the individual employees
and their performance (Cortez et al., 2009 pp.547).
The approach is organized similar to the tree structure in which all of the used nodes
indicate the value of the attributed target value. The decision node specifies the test to be carried
out by the single value attribute of the branches and sub-branches of the outcome tests.
Source: file:///C:/Users/user/Documents/fa84c2e3ad0e2ca07148ed000e98a5cc4d44.pdf
Document Page
DATA MINING 6
The attribute selection is usually made by use of the information in the decision tree model. In
the process of creating the Decision Tree in the prediction of the employees’ performance, there
is a need to find all information in each of the attributes (Jantan, Hamdan and Othman 2009
pp.775). This can be determined by the gain equation given below:
There is selection of the highest gain attribute in which two classes are considered. In this
equation, the set of the examples is determined to be S containing the P elements of the
employees represented by class A. Choosing 10 employees employed to do the same job, the
following table shows the algorithm of the whole process in accordance to the factors that affect
the performance of the employees.
The working
activity/task
Skills needed for
the job
Initiative Quality of
working
Results of
performance
Complex Serious Well Perfect As required
Complex Serious Well Perfect As required
Complex Serious Well Perfect As required
Complex Common Well Bad Below average
Complex Serious Average Perfect As required
Simple Common Average Bad Below average
Simple Serious Average Bad Below average
Simple Serious Well Perfect As required
Simple Common Average Bad Below average
Simple Common Well Perfect As required
Document Page
DATA MINING 7
4.0 The CRISP-DM model in prediction of the employees’ performance
This model is suitable in the prediction of the employment performance because it
provides general guidance in developing the Data mining lifecycle. The performance of the
employees is collected from the human resources database in organization concerned (Gera and
Goel 2015 pp.18). The series of various experiences are conducted and tested by use of the
employees’ form model. In case of the multiple and complexity of the datasets, the generic
process of the knowledge discovery database is reformed using the effective results. The
classification process is carried out by use of the three different data mining algorithms, which
includes; C4.5, ID3 and Naïve Bayes. The three data mining algorithm are used in identifying the
best algorithm of classification approach (Strohmeier and Piazza 2013 pp.2410). The chosen type
of algorithm is improvised through obtaining the best rate of classification. The predictions of
the employees' attributions appraisal form provide the results of their performance. The factors
which need to be considered in this aspect are; marital status, age, gender, specialization, the job
group, the level of training, salary, experience and qualification. The attribution factors are
classified into three categories. The first category is D which represents the age, marital status
and gender. The second category is E which represents the specialization and the level of
training. The last category is P which represents the experience, salary, designation and PAS.
The attributes in this framework are normally put in the performance target class (Wu 2010
pp.2371).

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA MINING 8
Source: https://pdfs.semanticscholar.org/c26e/ecb0736bb494afdba29b9fdb6b2d8da7293c.pdf
The target classes are discrete in natures and they have an outstanding improvement to
meet the minimal requirements possible. The same data can be represented in algorithm rate of
accuracy which offers the comparison of the accuracy obtained rates. They can as well be
classified into the different categories as shown in the table below.
Number The technique Accuracy
classification
A. ID3 63.67%
B. Naïve Bayes 81.15%
C. C4.5 93.23%
Document Page
DATA MINING 9
In this model, the attributes of the datasets can be revealed that P is the predominant and it is
controlled by the whole process of predicting the employees’ performance. The classification is
considered as the label attributes. The C4.5 algorithm is applied in this dataset to know the
results obtained by the set rule in the training phase. Also, the classification of the rule set in the
pre-processed dataset in the testing phase helps in analyzing the data obtained. The same data
can be represented by the prediction model, which offers the appraisal of the employees’
performance. The prediction model is illustrated below.
Source:https://pdfs.semanticscholar.org/c26e/ecb0736bb494afdba29b9fdb6b2d8da7293c.pdf
5.0 Data mining techniques
5.1 Data pre-processing and preparation by clustering of the dataset
In this analysis, the tool kit which is used as a machine learning platform is WEKA but in
the process, JAVA is used in prediction of the language to be used in the process. It provides the
combined application which is used by the users to access the updated information regarding
how the employees perform in an organization. There are several tasks which are applied in
Document Page
DATA MINING 10
clustering model. The clustering data set tool is supported by the WEKA based on the
algorithms used in statistical evaluations. In this case, WEKA users are able to make the
comparisons of the various results provided for accuracy determinations of the machine learning
and DM algorithms used (Jantan, Hamdan and Othman 2011 pp.1). The flexible dataset is one
of the flexible procedures used to detect the suitable algorithm in the provided dataset. The
clustering groups of the available is not predefined and therefore, the use of the clustering
technique identify the sparse regions and dense one in objective space of the prediction of the
employees’ performance in an organization or industry. The table below provided the various
types of the clustering techniques.
The type Algorithm
Measure of distance and similarity distance and similarity measure
Hierarchical Divisive and agglomerative
Outlier Outlier
Clustering of complex data DB Scan, BIRCH, Cure Categorical ROCK
Partitional Squared matrix, PAM, Minimum spanning
tree, clustering by use of the neural networks,
Bond energy.
Clustering prediction of the job description by use of the variable symbols and the variables
The data is shown in the table below.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
DATA MINING 11
Resource:https://www.researchgate.net/publication/
331165269_A_proposed_Model_for_Predicting_Employees'_Performance_Using_Data_Mining
_Techniques_Egyptian_Case_Study
6.0 Classification technique
The estimation and prediction of the employees’ performance in an organization is
similar to classification frameworks. The classification technique is used when evaluating the
training dataset and applied developed models in an organization concerning how employees
perform. The data mining classification is used to get the knowledge on prediction of the how
employees perform in their respective areas of specialization. The Decision Tree technique is
used to build the classification model used in this report (Chien and Chen 2018 pp.280). There is
several classification rules used in this process. In the process of validating the classification
model developed, the prototype is constructed and relevant data collected from the human
Document Page
DATA MINING 12
resources department regarding the employees’ performance. The results are used to show the
performance of the employees through observation of their experiences, qualifications, ages and
other factors as mentioned above. The classification model used propose the predictions the
performance various employees in the organization. The classification model helps the human
resources staffs to focus on the capacity of the employees on how they can do certain tasks
within the organization. The information in analysis of the classification technique is shown in
the table below.
Type of classification model Algorithm
Distance K nearest neighbors, simple distance
Statistical Bayesian, regression
Decision tree CART, SPRINT, C4.5, ID3
Neural network INN supervised learning, propagation
7.0 Association technique
The association technique is used in prediction of the performance of the employees by
finding the binary set of the variables affecting them. In this technique, the employees’
characteristics are evaluated and monitored to identify their performance race in an organization.
The associated rules in mining algorithms used in this technique include Apriori, DDA, CDA
and investigating measures.
8.0 The process of data mining
The data mining process involves determination of the employees’ performance in each
department of the organization by use of classification model. In data mining technique, data
Document Page
DATA MINING 13
preparation is the first step to be carried out. This involves obtaining the data from different
human resources departments regarding the performance of the employees. The data involving
the performance of the employees by their capacity and influence of other factors is stored in the
database systems and reports. The different tables are combined into one single table for analysis
and removal of any error noted. The next thing done is selection of the data and transformation
based on how employees perform their tasks in an organization. The predictive variables area
chosen while others are retrieved from the database systems. The table below gives the overall
variables related to the employees.
Variable Values
Independence Need for the improvements, As per
requirements
Reliability Need for the improvements, As per
requirements
Skills of the job Common, serious
Productivity Need for the improvements, As per
requirements
Interpersonal associations Need for the improvements, As per
requirements
Cooperation Need for the improvements, As per
requirements
Quality Good, bad

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA MINING 14
Cooperation is how the organizational employees interact with other coworkers and other
members in the organization. The way employees accept and respond to the new changes in the
organization determines their level of cooperation. Use of the data mining technique is used to
determine the degree in which employees cooperate in the allocation of the tasks and how they
do it (Dutt, Aghabozrgi, Ismail and Mahroeian 2015 pp.112). If the employees respond
positively to new changes and duties assigned, this means there is a high possibility of high-
quality jobs outcomes. In contrary, if the employees are not willing to do the tasks allocated, low
cooperation is found hence poor performance. The same applies when the employees agree with
the new changes made in the organization (Patil and Sherekar 2013 pp.256). If they are not
motivated by the new changes in the organization, they will perform poorly. The attendance is
punctuality of the employees to job or to turn up to the workplaces. The employees who do their
task on time by accept the attendance records and observing the prescribed periods within the
workplace perform better. This is contrary to employees who are not happy about the time
scheduled for the events in an organization. The initiative or creativity determines how the
employees handle their tasks allocated and improve on their performance. Creative employees
perform well as compared to non-creative ones. The creative employees come up with new and
better ideas of doing their allocated task. Adherence to the implemented policies determines the
performance of the employees in an organization. The employees who follow the rules and
procedures of an organization are able to perform better in their tasks. The overall performance
of the employees can be compared from various positions depending on the outcomes of the
tasks done by multiple individual employees in those positions.
8.1 Modeling and experiments in CIS4015-N: Data Mining and CIS4035-N: Machine
Learning
Document Page
DATA MINING 15
The classification model process is considered after the pre-processing and preparation of
data. In this process, the three classifications models are used, which include; SVM, Naïve Bayes
and DT classifier. The mentioned classification techniques are used when the datasets is used in
determining the performance of the employees. The difference between the CIS4015-N: Data
Mining and CIS4035-N: Machine Learning is that the first step in application of the CIS4015-N:
Data Mining is the learning and building model in which the predefined classes are developed
through analysis of the set training variable datasets (Al-Radaideh and Al Nagi 2012 pp.78). But
each of the CIS4035-N: Machine Learning is considered to be present level of the classification
model. The second step in this model involves the estimation of the accuracy. In other words,
this step is meant for validating the model by testing the variables by specific datasets (Baradwaj
and Pal 2012 pp.3417). If the accuracy of the model is accepted, the model is then used to set up
the new and unpredicted data concerning how employees perform in an organization. The third
step involves applying the model in the decision-making process regarding DT, Naïve Bayes,
among others. The information is presented in the figure below.
Source: https://images.slideplayer.com/16/5036924/slides/slide_10.jpg
Document Page
DATA MINING 16
The first step in the data mining application in the prediction of the employees’
performance is meant to understand the problems to be analyzed in the report and the objectives
to be achieved. The data miners should be well equipped with sufficient knowledge in
understanding the nature of the problem. This is aimed at improving the DM more efficiency and
effectiveness. It through analysis of the activities and outcomes of the classification model that
human resource management if found to be a complicated thing (Jantawan and Tsai 2013
pp.7123). In connection to this, there should be an application of the few quantitative methods in
determination of the employees’ activities.
9.0 Conclusion
The organizations use the data mining techniques in predicting the performance of their
employees by comparing their current performance with previous. The prediction of the
employees’ performance helps in salary merit adjustments. It also helps the managers and
supervisors understand the performance of the individual employees in an organization for
promotion purposes. Moreover, the data mining technique helps in finding employees who need
more attention. The Decision Tree method is used to predict how the employees performed in
recent past for comparison with current performance. This is one of the powerful tools which
enable the organization to allocate the resources to the employees based on their capacity to
perform. The sensitive resources will be allocated to serious and highly performing employees in
an organization. The managers will be able to record the performance trends of the employees in
the organization. Furthermore, the Decision Tree techniques will the management team to
implement new policies, remove some strategies or adjust other policies to improve employees’
performance in an organization. Besides, the errors are well identified by the use of the data
mining techniques in predictions of the employees’ performance. The employees who may have

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
DATA MINING 17
complex tasks and performing well are considered to be highly productive staff in an
organization and they are set as a role model of the other employees. In contrary to this, those
employees who have simple tasks but perform poorly are considered to be poor performing
members of the organization and they are advised to improve accordingly.
Document Page
DATA MINING 18
10.0 References
Al-Radaideh, Q.A. and Al Nagi, E., 2012. Using data mining techniques to build a classification
model for predicting employees performance. International Journal of Advanced Computer
Science and Applications, 3(2). pp. 78-87.
Baradwaj, B.K. and Pal, S., 2011. Mining educational data to analyze students'
performance. International Journal of Advanced Computer Science and Applications, 2(6), pp
63-69.
Chien, C.F. and Chen, L.F., 2018. Data mining to improve personnel selection and enhance
human capital: A case study in high-technology industry. Expert Systems with
applications, 34(1), pp.280-290.
COE, J., 2012. Performance comparison of Naïve Bayes and J48 classification
algorithms. International Journal of Applied Engineering Research, 7(11), pp.201-212.
Cortez, P., Cerdeira, A., Almeida, F., Matos, T. and Reis, J., 2009. Modeling wine preferences
by data mining from physicochemical properties. Decision Support Systems, 47(4), pp.547-553.
Dutt, A., Aghabozrgi, S., Ismail, M.A.B. and Mahroeian, H., 2015. Clustering algorithms applied
in educational data mining. International Journal of Information and Electronics
Engineering, 5(2), pp.112.
Gera, M. and Goel, S., 2015. Data mining-techniques, methods and algorithms: A review on
tools and their validity. International Journal of Computer Applications, 113(4), pp.18-23.
Document Page
DATA MINING 19
Huang, M.J., Tsou, Y.L. and Lee, S.C., 2016. Integrating fuzzy data mining and fuzzy artificial
neural networks for discovering implicit knowledge. Knowledge-Based Systems, 19(6), pp.396-
403.
Jantan, H., Hamdan, A.R. and Othman, Z.A., 2009. Knowledge discovery techniques for talent
forecasting in human resource application. World Academy of Science, Engineering and
Technology, 50, pp.775-783.
Jantan, H., Hamdan, A.R. and Othman, Z.A., 2011. Data mining classification techniques for
human talent forecasting. Knowledge-Oriented Applications in Data Mining, p.1.
Jantawan, B. and Tsai, C.F., 2013. The application of data mining to build classification model
for predicting graduate employment. Journal of Educational Data Mining, 1(1), pp. 1312-7123.
Patil, T. R., and Sherekar, S. S. (2013). Performance analysis of Naive Bayes and J48
classification algorithm for data classification. International journal of computer science and
applications, 6(2), pp.256-261.
Shazmeen, S.F., Baig, M.M.A. and Pawar, M.R., 2013. Performance evaluation of different data
mining classification algorithm and predictive analysis. Journal of Computer Engineering, 10(6),
pp.01-06.
Strohmeier, S. and Piazza, F., 2013. Domain driven data mining in human resource management:
A review of current research. Expert Systems with Applications, 40(7), pp.2410-2420.
Sung, T.K., Chang, N. and Lee, G., 2019. Dynamics of modeling in data mining: interpretive
approach to bankruptcy prediction. Journal of Management Information Systems, 16(1), pp.63-
85.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA MINING 20
Thakar, P., 2015. Performance analysis and prediction in educational data mining: Journal of
data mining and performance, 20(10), pp.1509-5176.
Valle, M.A., Varas, S. and Ruz, G.A., 2012. Job performance prediction in a call center using a
naive Bayes classifier. Journal of Expert Systems with Applications, 39(11), pp.9939-9945.
Wu, W.W., 2010. Beyond business failure prediction. Journal of Expert systems with
applications, 37(3), pp.2371-2376.
1 out of 20
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]