MITS5509 - Intelligent Systems: RapidMiner Data Analysis Project

Verified

Added on 2022/08/24

AI Summary

This document presents a data analysis project using RapidMiner, focusing on bankruptcy prediction and sales data analysis. The project explores various machine learning techniques, including neural networks, Support Vector Machines (SVM), decision trees, and linear classifiers. The student analyzes financial ratios to predict bankruptcy, comparing the performance of different models. The project also includes a sales data analysis section. The methodology involves using RapidMiner to build and evaluate models, with the goal of identifying key attributes for predicting firm bankruptcy and gaining insights from sales data. The document includes detailed explanations of each method, model diagrams, and references to relevant literature.

Running head: DATA ANALYSIS USING RAPID MINER
Data Analysis using Rapid Miner
Name of the Student
Name of the University
Authors note

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1DATA ANALYSIS USING RAPID MINER
Part 1
Neural Networks
Neural networks are series of the algorithms which endeavours in recognizing
the underlying relationships within data set through process which mimics the process in
which human brain works. In such sense, the neuron system is referred by the neural
networks, either artificial or organic in nature. The neural networks could adapt to the
changing input. Hence, network generates best result possible without needing in redesigning
criteria for output. Neural networks’ concept, whose roots are artificial intelligence, is
gaining popularity swiftly in development of the trading systems (Santoro et. al. 2017).
Neural networks assist in development of the process as modelling of credit risk, constructing
price derivatives and proprietary indicators, time-series forecasting and securities
classification. Working procedure of neural network is same as neural network of human
brain. Neural network’s neuron is mathematical function which classifies and collects
information as per specific architecture. Network bears strong resemblance for the statistical
methods like regression analysis and curve fitting.
Neural networks includes interconnected nodes’ layers. Every node is
perceptron and is same as multiple linear regression. Perceptron feeds signal that is produced
by multiple linear regression to activation function which might be nonlinear. Within multi-
layered perceptron (MLP), the perceptron is arranged within the interconnected layers. Input
patterns are collected by input layer. Output layer possesses output signals or classifications
to which the input patterns might map. Patterns might comprise quantities list for the
technical indicators for the security. Neural networks are used within variety of the
applications within financial services, from fraud detection and forecasting to risk assessment
and marketing research (Dong, Loy and Tang 2016). Neural network is used for prediction of

2DATA ANALYSIS USING RAPID MINER
market price of stock. Neural network is used for building training model through selection
40 data points randomly form both category 0 and category 1 and is tested for its
performance. Price data is evaluated by using neural network for making decisions that are
based on data analysis.
Support Vendor Machine
Support Vendor Machine (SVM) are the learning models which have learning
algorithms which analyse data that is used for regression analysis and classification. Training
algorithm of SVM builds model which assigns the values to a category, making it binary
probabilistic linear classifier. Model of SVM is representation of value as the points in the
space, mapped as values of separate categories could be divided by clear gap which is wide.
SVM is supervised algorithm of machine learning that could be used for regression problems
or classifications (Al-Yaseen, Othman and Nazri 2017). This uses technique known as kernel
trick for transforming the data and depending on the transformations, this finds optimal
boundary among possible outputs. It does few extremely complicated data transformations
and figures out the procedure to separate the data depending on outputs or levels as defined.
SVM has the capability to perform both classification and regression.
The data provided is taken and transformed by kernel trick. Algorithm of
SVM could compute more optimal hyperplane. Objective of SVM is finding hyperplane
within N dimensional space which classifies distinctly data points. For separating data points’
two classes, several possible hyperplanes are available which could be selected. Objective of
SVM is finding plane which has maximum margin, that is, maximum distance among data
points for both classes. Increasing distance of margin provides few reinforcement as for data
points in future could be classified with much more confidence. The hyperplanes are the
decision boundaries which help in classifying data points. The data points which fall on each
side of hyperplane could be attributed to separate classes (Suthaharan 2016). Dimension of

3DATA ANALYSIS USING RAPID MINER
hyperplane is dependent on amount of features. For instance, if input features’ number is 2.
Then hyperplane is one line. This is tough in imaging when quantity of the features crosses 3.
The support vectors are the data points which are closer with hyperplane as well as influence
orientation and position of hyperplane. By using such support vectors, margin of classifier
could be maximised. These are points which help in building SVM.
Decision Tree
There are many analogies in a tree in machine learning, which covers both
regression and classification. Within decision analysis, decision tree could be used for
explicitly and visually representing decision making and decision. It uses model of tree for
decisions. Though most used tool in machine learning to derive s strategy for reaching
particular goal. It is tool for decision tool which uses tree like model or graph of the decisions
and the possible consequences, which includes outcomes of chance event, utility and
resources costs. This is a way for displaying algorithm which contains only statements of
conditional control. Decision tree is structure like flowchart in where test on attribute is
represented by every internal node, outcome of test is represented by every branch and class
label is represented by every leaf node (Ke et. al. 2017). Paths to leaf from root represent the
classification rules.
Learning algorithms based on tree are considered as mostly used and one of
best learning methods. Methods based on tree empower the predictive models having high
stability, ease for interpretation and accuracy. Unlike the linear models, non-linear
relationships could be mapped by them as well. They could be adaptive for solving any type
of issue at hand. Algorithms of decision tree are referred as Classification and Regression
Trees (CART). Decision tree have natural construction of “if…then…else” which makes this
easily fit into programmatic structure (Frosst and Hinton 2017). It could be well suited for
categorization problems in where the features or attributes are checked systematically for

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4DATA ANALYSIS USING RAPID MINER
determining final category. As result, decision making tree is a classification algorithm that is
used in machine learning.
Linear Classifier
In machine learning’s field, statistical classification’s goal is using
characteristics of object for identifying which group or class this belongs to. Linear classifier
could achieve this through making classification decision depending on value of
characteristics’ linear combination. Characteristics of an object are known as the feature
values and are presented typically to machine in vector known as feature vector (Steyrl et. al.
2016). These classifiers work for practical issues like document classification and for the
problems having several variables or features, reaching the levels of accuracy comparable to
the non-linear classifiers, taking much less time for use. Popular procedures class to solve
tasks of classification are based upon the linear models. For classification problem of two
classes, one could visualise linear classifier’s operation as splitting input space of high
dimensional with hyperplane. Every point on a side of hyperplane is classified as “yes” and
others could be classified as “no” (Chen et. al. 2018). Linear classifier is used often in the
situations where classification’s issue is issue, as this is often fastest classifier. This algorithm
makes the classification depending on function of linear predictor combining set of the
weights with feature vector.
For the training set based on the above classifier’s predictions are made. Following is
the model developed for the analysis in rapid miner. The information input squares are
demonstrated by the "Read file" block. The "Recover" administrator stacks a Fast Miner
object into the information stream process. In the particular case it permits to choose
information put away in the nearby archive separated by the csv test dataset. So as to plan a

5DATA ANALYSIS USING RAPID MINER
useful preparing dataset, a rationale association must apply on each of the three datasets,
acquiring a solitary dataset to process.
During the preparation stage we utilized a specific calculation and get a model out of
the current information. Be that as it may, this process is not capable of providing the most
improved model for this specific dataset since it is just a single calculation and that
calculation would be a not a significant one. At that point we need to attempt various
calculations to get the improved model. Rather we selected a model stab at consolidating a
few potential frail calculations together or a few models of a similar calculation for various
datasets and make a group model. For this situation we are keen on making sense of which
attribute(s) are essential to peidct the category for the firms whether they are bankrupt o not.
This property execution can be an outcome without anyone else's input, since it can mention
to you what reasons cause a person or thing to carry on along these lines. In this article we
will talk about basic methods to discover these component loads.
Filtering methods
One of the most utilized strategies to discover an ascribe significance is to utilize a
factual measure to characterize significance. Frequently utilized measures are Correlation,

6DATA ANALYSIS USING RAPID MINER
Gini Index or Information gain. In RapidMiner you can compute these qualities utilizing the
Weight by Operators.
Results of the analysis is given below;

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

7DATA ANALYSIS USING RAPID MINER

8DATA ANALYSIS USING RAPID MINER
Part 2
Sales data analysis
For this section sales data is analysed in order to get the results and have insights
from selected dataset.
Following is the model which is used in the developed in RapidMiner to
predict the sales data.

9DATA ANALYSIS USING RAPID MINER
The results of the execution of this model is shown below;

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

10DATA ANALYSIS USING RAPID MINER
References
Al-Yaseen, W.L., Othman, Z.A. and Nazri, M.Z.A., 2017. Multi-level hybrid support
vector machine and extreme learning machine based on modified K-means for intrusion
detection system. Expert Systems with Applications, 67, pp.296-303.
Chen, T., Navrátil, J., Iyengar, V. and Shanmugam, K., 2018. Confidence scoring
using whitebox meta-models with linear classifier probes. arXiv preprint arXiv:1805.05396.
Dong, C., Loy, C.C. and Tang, X., 2016, October. Accelerating the super-resolution
convolutional neural network. In European conference on computer vision (pp. 391-407).
Springer, Cham.
Frosst, N. and Hinton, G., 2017. Distilling a neural network into a soft decision tree.
arXiv preprint arXiv:1711.09784.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y.,
2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural
information processing systems (pp. 3146-3154).
Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P. and
Lillicrap, T., 2017. A simple neural network module for relational reasoning. In Advances in
neural information processing systems (pp. 4967-4976).
Steyrl, D., Scherer, R., Faller, J. and Müller-Putz, G.R., 2016. Random forests in non-
invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-
linear classifier. Biomedical Engineering/Biomedizinische Technik, 61(1), pp.77-86.
Suthaharan, S., 2016. Support vector machine. In Machine learning models and
algorithms for big data classification (pp. 207-235). Springer, Boston, MA.