Data Mining, WEKA Software Analysis, Warehousing, and Knowledge

Verified

Added on  2023/03/21

|20
|4254
|76
Homework Assignment
AI Summary
This assignment delves into the crucial aspects of data warehousing, data mining, and knowledge management within an organizational context. It begins by elucidating the significance of a knowledge management system, highlighting its benefits such as accelerated information access, enhanced decision-making, and promotion of innovation. The discussion extends to various technologies supporting knowledge management, including workflow systems, groupware, enterprise portals, and eLearning software. Furthermore, the assignment differentiates between databases and data warehouses, emphasizing the benefits of data warehouses for efficient data storage, analysis, and their pivotal role in business intelligence. A practical component involves utilizing WEKA software for data analysis, specifically classifying votes using the J48 classifier, and interpreting the resulting classification report, including metrics like correctly classified instances, confusion matrix, and their implications for predictive accuracy. The assignment concludes by demonstrating the successful application of WEKA in manipulating and analyzing data to generate transparent and insightful results.
Document Page
Name: 1
BUSINESS INTELLIGENCE
by
Course Title
Tutor:
University/ Collage
Department
Date
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Name: 2
Question 1 (4 marks)
Knowledge Management System
The knowledge management system is an Information Technology platform that
facilitates storage and retrieval of knowledge, enhances partnership, captures the purposes of
learning, finds the sources of information, and make use of experience through improving the
processes of knowledge management (Santoro et al., 2018)
Knowledge Management Importance in an Organisation
The knowledge management system has various benefits to an organization which
includes accelerating the access of information and knowledge, boosts information inventions
and innovations and change in culture, enhances decision-making processes, improves the
satisfaction of customers and improvement of effectiveness and efficiency of the operations of an
organization (Santoro et al, 2018)
Accelerate Access of Information.
Knowledge management in an organization makes it simple for the organization to find
the information required by the company or the people responsible for handling the information
that the organization needs to use (Hislop, Bosua and Helms, 2018). The knowledge
management system improves the productivity and efficiency of the business, and it also ensures
that the organization works better, which increases the tendency of business growth.
Enhances Decision-Making Processes.
Document Page
Name: 3
The management and the top employees of the organization can enhance their decision-
making process by using the knowledge management system of the whole organization any time
they require the information. The business collaboration tools improve access to experiences and
the perspectives of various people during decision-making processes, which directly leads to
different options of the choices to be selected (Todorović, 2015, pp.772-783).
Promotes culture change and inventions.
Knowledge management system in an organization encourages and facilitate the sharing
of ideas, access to the updated information and teamwork in the organization (Honarpour, Jusoh,
and Md Nor, 2018, p. 801). The system furthermore makes people stimulate invention and
innovation, including the changes in culture required to transform the organization and meet the
continually changing needs of the business.
Improves Satisfaction of the Customer.
The collaboration and the knowledge sharing within and without the organization helps to
improve the value at which the customers are treated and attended to. The business is in the
position to provide expert answers within a short, which in turn improves the product.
Improves the efficiency of the organization.
The employees and the knowledge workers can effectively work due to the increasing
speed of information access and resources in the organization. According to (Omotayo, 2015,
pp.1-23.) the study that was undertaken where many executives of various organizations
Document Page
Name: 4
underwent interview shows that social technologies for collaboration enhance the processes of
the business and the general performances.
Technologies used in support knowledge management
Various technologies can be used to support the knowledge management processes
discusses bellow;
Workflow system.
This is a system that enables process representation to be an association with its creation,
its use and how the organization will manage the knowledge such as the process of production
and utilization of the documents and forms (Ladd, 2016).
Groupware.
This a software used in the knowledge management system to enable sharing and
collaboration of the information. This software avails the tools for sharing of documents,
corporate emails, discussions among various features related to information sharing.
Enterprise Portals.
These are the software that collectively joins the information in the whole organization.
The knowledge management system uses this to provide information to various groups, which
include the project team (Kudryavtsev, and Gavrilova, 2016).
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Name: 5
eLearning Software.
This is a software used by the knowledge management systems to promote organizations
in creation and facilitation of education and pieces of training which are customized such as the
lesson plans, process monitoring and classes which are mostly online (Liebowitz and Frank,
2016).
Question 2 (6 marks)
Distinctions on Database and Data warehouse
The Data warehouse.
It is a merged repository in an organization for all the collected data which is performed
by various working systems which can be logical or physical. It emphasizes on the collection of
data from different sources for analysis and access (Padmanabhan and Patki, 2016). The data
warehouse is housed in the mainframe computer of the organization or instead of the cloud in the
form of a relational database. This is where the information from diverse Online Transaction
Processing software is accurately captured for intelligent activities of the business, such as
supporting customer satisfaction and decision-making process.
Database.
The database is information which has been grouped, which is correctly organized for
efficient access, analysis, and updates. The data in the database is collected and stored in form s
Document Page
Name: 6
or tables, columns, and rows are given indices for fast access of the essential data. The processes
of data deleting, updating, and expansion when the addition of new data. The workloads in the
database are processed to generate and update data independently, which can also enforce
queries the information in the database (Coronel and Morris, 2016).
Benefits of the Data Warehouse for Data Storage and Analysis
A data warehouse is important to the user for data storage since the warehouses are
premeditated to be able to store information and the analysis of the data which aims at high
speed of retrieval of information and the data analysis. The design of the data warehouse is
aligned at the purpose of storage of large amounts of the information which can easily be
analyzed and continuous queries. The analytical platform of a data warehouse is the design
which aims at the modification and generation of the information (Wullink, Moura, Müller and
Hesselman, 2016, pp. 913-918). Moreover, the data warehouse can take significant burdens of
the system to be removed from the operational situation and efficiently shares loads of the
system within the technological infrastructure of the organization.
Role of data warehouse in BI
Data warehouse promotes Business Intelligence through the knowledge gained from the
enhanced access to the information. The top officials of the organization, such as the managers,
Document Page
Name: 7
will improve in the way they make decisions due to the presence of extensive knowledge. The
crucial decisions with significant impacts on the business will be made according to the valid
facts that will come with evidence and the real data of the organization. Besides, the executive
responsible for decision making will be in a better position to provide decisions as they will have
the ability to query the real data and they will also have an opportunity to get information based
on their preferences. The related business intelligence and the data warehouse can be directly
used in the business process, such as management of finances, inventory management, sales, and
marketing (Fekete, 2016, pp. 50-55).
Question 3
WEKA software
Understanding of WEKA software and data analysis
Weka software is made up of various tools and algorithms such which are used for the
data analysis using tools such as visualization platforms which also leads to the generation of
models such as those used in prediction, the tools work concurrently with the user interface to
make the process of access simple and the use of the functions. The initial Weka did not use
Java; it made use of the front end algorithms for modeling. The implementation of the modeling
algorithms was done using C programming, making use of the utilities in the language (Hall et
al., 2009, pp.10-18). The files were created using the experiences gained from learning the
machine languages and the systems which were based in makefile platform.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Name: 8
Weka software has been able to do data analysis due to its availability, Weka if free and
open source through the GNU public license. It is also portable and can be used in any the latest
and current computers because it has been implemented using the dominant programming
language, which is java. Weka contains numerous and immense techniques such as modeling
techniques, data analysing, and inclusive data collection methods (Agapito, Guzzi, and
Cannataro, 2018, p.17). It is easy to use Weka due to its properly design graphical user interface
with correctly design buttons.
The Weka software promotes many data mining techniques such as classification of data,
information clustering, preprocessing of data, selection of features, and visualization of
information. The data collected by the weka software is assumed to be data from a disorganized
file or a flat file; therefore, procedures of weka capable working in this data. Weka can enable
the results to be returned by the database query through the use of the Java database connectivity
where SQL database is used.
Weka facilitates the process of extensive learning using deep learning utility. Collection
of linked databases can also be converted to one table that can be processed using Weka
software.
The user interface of weka software is the explorer, which is the main however similar
functions performed by the Explorer can be achieved by the knowledge flow interface, which is a
component-based interface through the command line. The experimenter is also the part of the
user interface which facilitates the organized comparison of the Weka's performance of
prediction on the algorithms of machine learning on the groups of various datasets (Varouqa, and
Hammo, 2016, pp.359-371).
Document Page
Name: 9
Analysis of Classification Results
The data taken into the weka software has been subject to various techniques for it to be
analyzed where trees have been generated. Below is the output of the classifiers
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: vote
Instances: 435
Attributes: 17
handicapped-infants
water-project-cost-sharing
adoption-of-the-budget-resolution
physician-fee-freeze
el-salvador-aid
religious-groups-in-schools
anti-satellite-test-ban
aid-to-nicaraguan-contras
mx-missile
immigration
synfuels-corporation-cutback
education-spending
superfund-right-to-sue
Document Page
Name: 10
crime
duty-free-exports
export-administration-act-south-africa
Class
Test mode: split 66.0% train, remainder test
=== Classifier model (full training set) ===
J48 pruned tree
------------------
physician-fee-freeze = n: democrat (253.41/3.75)
physician-fee-freeze = y
| synfuels-corporation-cutback = n: republican (145.71/4.0)
| synfuels-corporation-cutback = y
| | mx-missile = n
| | | adoption-of-the-budget-resolution = n: republican (22.61/3.32)
| | | adoption-of-the-budget-resolution = y
| | | | anti-satellite-test-ban = n: democrat (5.04/0.02)
| | | | anti-satellite-test-ban = y: republican (2.21)
| | mx-missile = y: democrat (6.03/1.03)
Number of Leaves: 6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Name: 11
Size of the tree : 11
Time is taken to build model: 0.24 seconds
=== Evaluation on test split ===
Time taken to test model on test split: 0.06 seconds
=== Summary ===
Correctly Classified Instances 144 97.2973 %
Incorrectly Classified Instances 4 2.7027 %
Kappa statistic 0.9447
Mean absolute error 0.0608
Root mean squared error 0.1539
Relative absolute error 12.6846 %
Root relative squared error 31.0328 %
Total Number of Instances 148
=== Detailed Accuracy By Class ===
Document Page
Name: 12
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.965 0.016 0.988 0.965 0.976 0.945 0.990 0.986 democrat
0.984 0.035 0.953 0.984 0.968 0.945 0.990 0.988 republican
Weighted Avg. 0.973 0.024 0.973 0.973 0.973 0.945 0.990 0.987
=== Confusion Matrix ===
a b <-- classified as
83 3 | a = democrat
1 61 | b = republican
The two types of the classified instances, which are the correctly and incorrectly
categorized occurrences, represents the percentage of the illustrations which was correctly and
incorrectly classified. The confusion matrix indicates the raw numbers where a and b shows the
labels of the class. To find the total instances we add aa + bb = 83+61=144 , ab +ab=1+3=4. The
144 represents 97.2973 % while 4 represents 2.7027 %, making it a total of 100%.
The correctly classified instances have the percentage known as sample accuracy or only
accuracy which is disadvantaged in the estimation of performance that is, it is insensitive to
distribution of class (Pal et al., 2016, pp. 191-202).
chevron_up_icon
1 out of 20
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]