Case Study of UPNM Students Performance Classification Algorithms

Added on 2023-03-23

6 Pages5459 Words23 Views

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/329528133
Case Study of UPNM Students Performance Classification Algorithms
Article in International Journal of Engineering and Technology · December 2018
DOI: 10.14419/ijet.v7i4.31.23382
CITATIONS
0
READS
139
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Situational awareness for malaysian military observers View project
Keyword Patterns Analysis on Military Knowledge using Data Mining Technique for Military Personnel View project
Nur Diyana Kamarudin
National Defence University of Malaysia
12 PUBLICATIONS 15 CITATIONS
SEE PROFILE
Zuraini Zainol
National Defence University of Malaysia
46 PUBLICATIONS 106 CITATIONS
SEE PROFILE
All content following this page was uploaded by Zuraini Zainol on 10 December 2018.
The user has requested enhancement of the downloaded file.

Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is properly cited.
International Journal of Engineering & Technology, 7 (4.31) (2018) 285-289
International Journal of Engineering & Technology
Website: www.sciencepubco.com/index.php/IJET
Research paper
Case Study of UPNM Students Performance Classification
Algorithms
Syarifah B. Rahayu1*, Nur D. Kamarudin2, Zuraini Zainol3
1Cyber Security Centre, National Defence University of Malaysia, Sungai Besi Camp 57000 Kuala Lumpur, Malaysia
2Computer Science Department, Faculty of Science and Defence Technology, National Defence University of Malaysia, Sungai Besi
Camp 57000 Kuala Lumpur, Malaysia
*Corresponding author E-mail: syarifahbahiyah@upnm.edu.my
Abstract
Most students have a problem to keep track on their learning performance. Some lecturers with high teaching hours and burden of ad-
ministration jobs may have difficulty to identify weak and low performance students. In this study, three classification tech niques are
applied on educational datasets to predict the students’ performance based on coursework assessments. Thus, this prediction results may
help lecturers and students to improve their teaching and learning process. The objective of study is to predict students’ performance
based on coursework assessments using classification algorithms. The selected classification algorithms applied in this study such as J4 8
Decision Tree, Naïve Bayes and kNN. WEKA is used as an experimental tool. The selected algorithms are applied on a data of student
database of Data Mining subject. Findings shows Naïve Bayes outperforms other classification algorithms with above 80% prediction
rate. Thus, the students’ performance for Data Mining Subject is improved. As a conclusion, the classification algorithms can predict
students’ performance on a particular subject based on coursework assessments.
Keywords: Prediction; Comparative Analysis; Educational Data Mining
1. Introduction
Educational Data Mining (EDM) researches use data mining tools
to process large quantities of data to discover meaningful patterns
in order to predict students’ performances to enhance teaching and
learning outcomes. These researches can also be used as a plat-
form to alert student on the risk of failure and to provide recom-
mendations for student improvement in their learning process.
One of the criteria for a high quality university is based on its
excellent record of academic achievement [1]. Therefore, student
performance is a crucial part in higher learning institution. A stu-
dent performance is often measured based on the subject work
assessments and final exam. The proposed methodology is to ana-
lyze students’ performance of a particular subject. The findings
are used for predicting their performance before they are taking a
final exam. Thus, it will assist the lecturers or educators to identify
students who need supports to perform well in the final exam.
Besides, students can improve their learning process in order to
pass the subject [2]. The objective of this study is to predict the
students’ performance based on Malaysia Grading System. These
performances are predicted using three different classification
algorithms, for example, J48 Decision Tree, Naïve Bayes and
kNN.
The rest of this paper is organized as follows. Section 2 presents
the background and related work to this study. In Section 3, we
described the framework of our proposed research. Section 4 dis-
cusses the experiment and results. Finally, we conclude this paper
with future work in section 5.
2. Background and Related Works
In this section, some related topics on data mining, knowledge
discovery in databases, classification algorithms and reviews on
related work are discussed.
2.1. Data Mining and Knowledge Discovery in Database
Data Mining (DM) and Knowledge Discovery in Databases (KDD)
are two terms that are often used interchangeably. KDD can be
defined as a process of finding useful information and patterns in
data [3]. In KDD, DM is placed in the fourth steps of the KDD
process. Technically, the KDD process consists of five main steps
such as selection, pre-processing, transformation, data mining, and
interpretation or evaluation (see Fig 1).
According to [3], DM is often applied to extract hidden informa-
tion and useful patterns using algorithms from massive amounts of
data which is derived by the KDD process. Such valuable infor-
mation and patterns may assist the top level managers in decision
making. DM has been applied in various application areas such as
market based analysis, healthcare [5], smart homes [6], business,
text documents [7-10], environmental studies [11, 12], flood de-
tection [13], crime investigation, fraud detection, geology, food
microbiology, astronomy, etc. Researchers [14] summarized some
common data mining tasks and techniques (see Table 1). These
tasks and techniques can be applied individually or they can be
combined together to perform more sophisticated processes.

286 International Journal of Engineering & Technology
Fig. 1: Knowledge Discovery in Databases (KDD) process adopted from [4]
2.2. Classification Algorithms
Classification is a supervised learning where the classes are often
determined before data can be mined [3]. Technically, classifica-
tion will assign the data into several predefined classes. Classifica-
tion technique is often applied for predicting or describing dataset
or nominal categories. Each classification technique (see Table 1)
will apply a learning algorithm to identify a model which is best
fitted the relationship between the set of attributes and the class
label (predefined class) of the input data. The model that has been
produced by a learning algorithm should be able to fit the input
data and predict the class label of the records correctly [15].
Table 1: Data Mining Tasks and Techniques Adopted From [14]
DM Tasks DM Techniques
Classification Decision Tree Induction, Bayesian Classifica-
tion, Fuzzy Logic, Support Vector Machines
(SVM), k-Nearest Neighbors (K-NN), Rough
Set Approach, Genetic Algorithm (GA), etc.
Clustering Partitioning Methods, Hierarchical Methods,
Density-based Methods, Grid-based Methods,
etc.
Association Rules Frequent Item set Mining Methods (e.g.,
Apriori, FP-Growth)
Some examples of classification technique are detecting spam
email messages based on the message header and content, catego-
rizing cells as malignant or benign based on the result of MRI,
identifying credit risks based on bank loan, predicting students’
performance, etc.
In [16-19], the Decision Tree (J48), Bayesian Classifier and k-
Nearest Neighbor (kNN) classifiers have been implemented to
evaluate students’ performances based on several observational
attributes such as accumulated exam grades, percentages or clas-
ses (i.e distinction, fail etc). Based on comparative analysis of
classifier in [16], the Bayesian classifier outperformed the deci-
sion tree and kNN classifier on predicting students’ performances
via average True Positive (TP) rate. However, in analyzing the TP
rate for each classes (Distinction, First, Second, Third and Fail); it
has been observed that, the prediction rates are not uniform among
classes. Hence, the gap of prediction rate among classes is varied
almost 90% in some cases. This might be due to the insufficient
data of certain classes especially in distinction and fail classes.
To discover the optimal classification model for decision tree,
research in [20] did the comparison of different algorithms com-
prises of J48, ID3, C4.5, REPTree, Random Tree and Random
Forest. Out of six decision tree algorithms, the highest percentage
is achieved using the model relying on the algorithm J48. Based
on 161 questionnaires, two researchers from University of Basrah
have analyzed and assisted academic achievers in higher education
using Bayesian Classification Method [21]. For attribute selection,
questions with high correlation averages have been adopted to
enhance the accuracy of classification.
Recent work has been done to demonstrate the efficiency of Semi-
Supervised Learning (SSL) methods for the performance predic-
tion of high school students using their final examination assess-
ment percentage [22]. In this work, various SSL algorithms such
as Self-training, Co-training, Democratic Co-learning, Tri-
training, De-Tri training and RASCO are implemented in KEEL
Software tool. In addition, Friedman Aligned Ranks nonparamet-
ric test is used to measure the performances of these algorithms.
Moreover, in second phase of experiments, the performance of
SSL classifiers have been compared with supervised method, Na-
ïve Bayes. From the observation, it can be concluded that SSL
algorithm are comparatively better than the respective supervised
algorithm, Naïve Bayes based on both measurement; the accuracy
and Friedman Aligned Rank.
A comprehensive survey is then carried out by the Indian Re-
searchers to discuss about the current approaches and potential
areas in EDM [23]. This paper reported the details of researches
done in the area of education in tabular form describing methodol-
ogies and findings of each research and identifies potential re-
search areas for future scope. Similar research is conducted by the
researcher in [24] where the new potential domains of EDM have
been proposed. According to this paper, EDM data is not limited
to predict the student’s performance but can also be utilized in
other domains of education sector (i.e. optimization of resources
or human resource purposes).
In comparison of correlation among pre and post enrollment fac-
tors and employability using data mining tools, many of today’s
graduates are lacking interpersonal communication skills, creative
and critical thinking, problem solving, analytical skills, and team
work [25]. It has been concluded that cognitive factors such as set
of behaviors, skills and attitudes play a significant role in predic-
tion of student’s marketability after graduation. Another work
presented by Research Group for Work, Organizational, and Per-
sonnel Psychology, Department of Psychology, KU Leuven, Leu-
ven, Belgium stated that employability is in strong correlation
with competences and dispositions [26].
Motivated by the previous researches, this research attempts to
evaluate the performances of several students by measuring their
subject work assessment percentages (Quizzes, Tutorials and Test)
via predetermined classes endorsed by the university and Malaysi-
an Grading System to predict their performance in final exam. To
the best of our knowledge, this is the only research paper that
discusses the student’s performance prediction in Malaysia based
on Malaysian Grading System apart from using the students’
CGPA. We proposed this research as a preliminary assessment
tool where we narrower the scope of research to cater early recog-
nition of student who needs help in certain subject not the whole
performances of student from his or her CGPA. In addition, by
analyzing the distributed rank or weightage on each assessment
using decision tree, this research will offer guidance to a lecturer
on improving the teaching plan based on the learning outcomes in
certain assessment as well as to identify weak students to improve
the students’ learning process prior to the final exam. Future con-
tribution will be the automatic application on a platform that is
able to read, analyze and predict the outcome of student’s progress
based on certain assessments in difficult or challenging university
subjects for intelligent tutoring or lecturing applications.
3. Research Framework
Figure 2 illustrates the proposed framework for predicting stu-
dents’ performance in Data Mining subject. The first stage is data
collection. The data about students related to a particular subject

End of preview

Want to access all the pages? Upload your documents or become a member.