Data Analysis with WEKA and Excel

Verified

Added on  2020/01/28

|22
|4340
|1896
Report
AI Summary
This assignment delves into the realm of data analysis using both WEKA and Excel software. It introduces various techniques such as decision tree construction, classification algorithms (including Naive Bayes and J48), and data visualization methods like pivot tables and lookups. Students will learn to analyze datasets, build predictive models, and interpret results using these tools.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
DATA HANDLING
AND BUSINESS
INTELLIGENCE

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
INTRODUCTION...........................................................................................................................4
PART A...........................................................................................................................................4
Reasons for Declining sales of the business................................................................................4
Advantages and Disadvantages of Excel.....................................................................................6
Strength of Excel in analysis of data...........................................................................................7
Advantages of Excel in visual presentation of data.....................................................................9
Disadvantages of Excel................................................................................................................9
PART B..........................................................................................................................................10
Reason of using Weka to gain competitive advantages.............................................................10
Advantages and disadvantages of Weka....................................................................................10
Application of J48 Algorithm....................................................................................................12
Application of clustering method on the data of Audi...............................................................14
Modification of existing data.....................................................................................................18
CONCLUSION .............................................................................................................................19
REFERENCES..............................................................................................................................20
2
Document Page
Illustration Index
Illustration 1: IF Function................................................................................................................6
Illustration 2: LOOK UP..................................................................................................................6
Illustration 3: Pivot table..................................................................................................................7
Illustration 4: Decision tree output table........................................................................................11
Illustration 5: Decision tree............................................................................................................12
Illustration 6: Cluster of finance and TT........................................................................................14
Illustration 7: Cluster of Finance and A4.......................................................................................15
Illustration 8: Cluster of finance and RS7......................................................................................17
3
Document Page
INTRODUCTION
Data collection is an integral aspect of an organization which needs to be improved by
adopting the best appropriate data handling tool. Weka software has been selected as one of the
major components in the enterprise. Data analysis is essential tool used by Superstore and Audi
in order to resolve their existing business problems. The selection of Weka software will be
beneficial for an enterprise owner that guides the business entity in handling of data properly by
applying various measures. The current project report is all about explaining excel and Weka
software in relation to each other. The practical application of the Weka software has also
explained properly by the business entity. The decision tree and cluster analysis are explained
properly by defining various variables affecting or improving the performance of the business
entity.
PART A
Reasons for Declining sales of the business
Illustration 1: Calculation of unit cost
4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Illustration 2: Margin calculation per unit
Illustration 3: Application of IF Formula
5
Document Page
Illustration 4: Application of LOOKUP
Illustration 5: Application of Pivot table
The analysis conducted in order to identify that the reasons of reduction of sales has
identified which decreases the sales of Superstore company (Classification via decision trees in
Weka, 2016). The researcher has ascertained that there are two factors responsible for the
reduction in number of sales which are discounts and lower margin of profit. The current
approach followed by this entity is to sell products at different range such as higher and lower
profit margin will not create significant impact on overall margin of profit. The Superstore sells
at different range in order to gather the higher base of customers in their current organization.
The amount of Vat is correlated with the prices offered by an entity in order to please its variety
of customers. The results have been shown that the higher price of the products will be able to
6
Document Page
generate lower income coupled with the price of products offered by an entity. The proportion
of discounts will decrease the income earned by an entity during the financial year (John, Skaria
and Shajan, 2016). The discounts given by the business owner to all the customers are relatively
less in relation with the prices offered by them in order to seek their attention. There are various
instances seen in the analysis that helps an entity owner in order to guide their further business
on a right track. For instance, The product offered by this entity through delivery truck by
offering the price of product is around 5472 with a margin of 0.59. On the contrary, to this other
product delivered at the door step of an individual through truck delivery mode at 1810 with
margin of 0.77 (Seif, 2016). Both the examples have shown the deflating percentages of profit
with the increased or decreased of pricing pattern. The results reveal that the first case the margin
of profit set by an entity is low with the high price but on the other hand the margin gets higher
with the lower amount of prices offered by an entity owner.
After, identifying the reasons of reduction in the sales of the superstore company it is
advisable to change their price of product along with discount policy. The revisions made by an
individual in these factors will improve their overall profitability of the business.
Advantages and Disadvantages of Excel
Excel is regarded as one of the important software used to present the overall data into
spreadsheet forms (Kalmegh, 2015). The calculations are electronically performed using this
type of software which is used commonly in processing raw data into finished form. There are
various benefits enjoyed by this software in relation with the pre-processing of facts and figures
which are given as below:
Systematic arrangement- The large number of data is arranged into systematic and
chronological order which assist the top management in analyzing the facts and figures (Rao and
Reddy, 2014). The haphazard data will create trouble for the owner in combining all data
together in one form of data. There are various functions in excel which simplifies the large set
of data by prioritizing the highly important information from the worst set of facts and figures.
Data analysis- Excel uses different functions whose primary motive is to analyze the data as it is
regarded as one of the important tool of data scientists. The facts and figures are collected to
analyses its accuracy by applying monetary measures (Saxena, 2015). The researchers will use
different set of data by dividing the values into the group of data. The use of conditional
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
formatting is one of the option in which the data will be pre-processed the values enter in the
analysis. The identification of the values will further helpful for an entity in order to form good
business decisions.
Effective comparison- The current tool is efficiently used by an entity owner in order to
compare the previous entered data with the latest facts and figures. The facts and figures will be
analyzed properly in order to predict trends and patterns to ensure the accuracy of decisions. The
application of excel will be used in summarizing the data which enhances the structure of overall
data.
Relationship among data- Excel will help in forming relationship between various rows and
columns in order to link all the spreadsheets together in order to produce good amount of results.
The final decision of an entity will totally base on the analysis conducted by an individual in
testing the accuracy of the information.
Strength of Excel in analysis of data
The analysis of set of values is possible with the help of various functions of Microsoft
excel (Rianse, 2015). The functions of MS Excel are further classified into the various categories
such as normality function and advance functions. The normal functions emphasize on the basic
calculations such as mean, median and sum of facts and figures entered in form of data in this
analysis. On the other side, the advance function includes VLOOKUP and HLOOK UP. The
selection of the best suitable technique is on an individual in order to analyses the set of data.
The functions of this software has further segmented into two basic divisions such as financial
and non-financial or we can say that statistical and non-statistical tools and techniques (Saxena,
2015). The statistical measures include annova, T-test, correlation, regression, histogram and Z-
test. Non-statistical methods will include LOOKUP and INDEX which is another name of
analyzing of data. The variety of measures used by an entity owner in order to observe the
current set of data in relation with the application of different techniques which are given as
below:
8
Document Page
IF FUNCTION- The commonly used technique in the Microsoft excel is used as condition
statements (Rao and Reddy, 2014). This function is used in order to allow the users in creating
logical comparisons between actual and predicted value. The current equations of the IF
statements are widely used in order to determined the specific output. The primary motive of this
function is to assess the large set of data in a single attempt. The conditional statements will be
able to produce good amount of results by comparing its performance with the previous variables
entered in the analysis.
Illustration 7: LOOK UP
LOOK UP- The current function is used in the searching of variables specifically related to the
complete set of data (Saxena, 2015). The discovery of the specific variables is not possible in
manual form as it can be generated through electronically mode. This will save the time of an
individual by analyzing the complex and large set of data using this source of application. The
problem can be removed by using the LOOK UP as the problem statements. The identification of
the variables is possible by using these statements in order to form specific relationship among
different set of variables.
9
Illustration 6: IF Function
Document Page
Illustration 8: Pivot table
PIVOT TABLE- It is regarded as the summarizing tool which is used to present the data in
visualizing manner to please the different users of the business entity (Gao, 2015). The current
function will be able to generate various things associated with the single function. It helps in
generating electronically such as sorting, counting, totaling or averaging the large set of data.
Pivot tables is useful tool used by an individual in conducting calculations in spontaneous ways.
The current tables use simplification method that stresses on online analytically processing
concepts. It is that tool which is commonly used in assessing the trends and patterns of the
existing data.
Advantages of Excel in visual presentation of data
Excel software is commonly used to assess the data entered into the analysis process of
judging the accuracy of data (Kotthoff, 2016). The current application software is widely used in
analyzing the current performance of the information tested on various parameters. This
application provides automation of various business operations conducted in the firm. The data
stored in the form of this analysis can be manipulated in order to present into the complete set of
data and information. There are various benefits of the current application in the presentation of
data which are given as below:
Visualization- The data entered in the analysis which allows the users to present the
complex set of data into various bar charts and bar graphs. The attractive and pleasing
effects created with the data in order to increases the chances of interpretation of data by
different users of the business (Abdi, 2016). The information is presented among the
various persons with a clear aim to make good decisions in the business entity to
facilitate its variety of users.
10

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Automation- The current application software used to generate various forms of charts
from the given set of data. The spreadsheet generated through this particular application
will be able to generate electronic and inter connected spreadsheets. The basic advantage
of this tool is that the generation of output is quite easy without demanding higher input.
Integration- The data entered into the sheets will be integrated by creating link with each
other in order to produce the good amount of results. The results will be beneficial for an
entity owner in order to form good decisions in the business related to the current aim of
the project.
Disadvantages of Excel
There are various limitation of Excel which are given as below:
It will not handle large and complex set of data
The management of data will create chaos for the users in order to achieve the desired
goals and the objectives.
PART B
Reason of using Weka to gain competitive advantages
The place of Weka in the current world is higher than compared to all the players who
exist in the same market in simplifying the problems of an individual and the business entity.
The modern world pays to attention that methods which enhances the role of the existing
business with the usage of new tools and techniques (Manimekalai, 2016). The Weka is
commonly used technique which handles complex set of data into various fragments. The Weka
gained more advantage due to the weaknesses of excel. Excel will not able to handle data very
conveniently than compared to the Weka (Mir, Khan, Butt and Zaman, 2016). The data will not
be mined properly in excel as this is only used for quick calculations of various facts and figures.
The various competition faced by Weka software from powerful tools used by the data scientist
in analyzing the current set of data (Seif, 2016). R language and SAS are one of the commonly
used techniques which uses program language in which the coding needs to be performed by the
analyst in order to make decisions in the favor of an individual. The Weka will present the data
into the output and decision tree without using complex coding. The current software saves
the time of an entity in order to perform tough calculations by selecting algorithms and various
data set. The calculations are performed automatically as it facilitates its users to get the quick
11
Document Page
solution by entering the data (Weka, Ikeh and Kamani, 2013). This software is useful for the
owner to be used in tough and emergency conditions faced by an entity. The problems will be
rectified by an entity by using various factors of this software in resulting the right amount of
output.
Advantages and disadvantages of Weka
Advantages
This application is strong enough in order to handle the bulky set of data in single attempt
It saves the time and energy of a person by generating accurate results in less time period
The algorithms and selection of data set are automatically programmed into the
application software.
It beaten excel in relation to the generation of accurate outcomes very conveniently
Excel will crash by handling large set of data which is compensated by an individual by
using Weka software.
The Weka utilizes the different set of statistical functions which is helpful for an entity in
order to produce accurate amount of results.
The data are analyzed by the users in different context in order to help the business users.
Disadvantages
The basic limitation faced by an individual using this kind of software is to focus on only
calculations related to the statistics.
The limitation of this software are compared with the strength of Excel which is to
perform calculations in both manner such as statistical and non-statistical measures.
The wrong programming of functions in the software will generate wrong results that
divert the mind of an individual towards different things.
The focus of an entity will get shifted to another level by not emphasises on their current
aim which is to resolve the issue.
12
Document Page
Application of J48 Algorithm
Illustration 9: Decision tree output table
Interpretation
The Weka software uses this particular algorithms in order to present the set of values
and data in form of decisions in order to ensure proper decision making in the business (Read,
Reutemann, Pfahringer and Holmes, 2016). The algorithms will define the overall process that
guides an individual in completing their tasks to support the aim of the research. The J48
algorithms will be used by an individual in order to predict the target variable which is the reason
of decreasing the amount of sales in Audi company (Rianse, 2015). The decision tree is the
outcome of the analysis conducted by the researcher by using this software. The information will
be gained by a person using this application in which the knowledge of an individual will get
refined with the passage of time.
13

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Illustration 10: Decision tree
Interpretation
The above mentioned decision tree is reflecting the true position of Audi by showing two
different possibilities (Rao and Reddy, 2014). The possibilities generated by this decisions tree is
related to the purchases made by the customers that is first and either purchases. The analysis of
the above diagram reveals that the buyers who purchase cars of Audi prior to any month of 2005
year will not taken the benefit of extended warranties option provided by the firm to all the
customers. On the other hand, the first purchaser is made by the customers after the month
mentioned in the analysis. The customers who are taking the advantage of first purchase of
products are using the advantages of extended warranties (Kalmegh, 2015). The results of both
the options are opposite to each other as one option are not using the facilities but the other
option utilizes the facilities provided by the business entity.
14
Document Page
Application of clustering method on the data of Audi
15
Document Page
Illustration 11: Cluster of finance and TT
Interpretation
The cluster analysis is used by an entity while using the Weka software by forming small
group of objectives (Beckham, Hall and Frank, 2016). The different variables are selected
together who held responsible for reduction in number of sales of Audi company. The generation
of sales and earning of profit is one of the important motive of an entity owner while operating
their business. The above mentioned figure will reveal that the current analysis has stated four
clusters has been discovered through the process. The above image has reflected the bigger
group of values on right side of the image as compared to the other side of the image. The image
16

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
reveals that there are closeness among both the variables of the current analysis. The values of
the cluster are closely related to each in different contexts used by an entity owner. The final
outcome of the above analysis shows that the TT car department can raises the existing level of
their sales by providing finance to all the customers (Kotthoff, 2016). The providing of finance
will help the users to boost the existing sales level by reducing all kinds of expenses. The above
cluster is based on analyzing the relationship among two variables finance and TT car
department which resulted into increasing number of sales and the revenue.
Illustration 12: Cluster of Finance and A4
17
Document Page
Interpretation
The above figure shows good results in the favor of an entity by showing large size
cluster on the right side. This group of variables shows the relationship between two variable
such as Finance and A4 (Youn, Won, Youn and Scheffler, 2016). The results of the above
analysis revealed that majority of customers are purchasing A4 cars without taking finance from
the firm. The above analysis will also comment on the ability of buyers who purchase the car
from Audi company without taking debt obligations in form of finance offered by this particular
entity.
18
Document Page
Illustration 13: Cluster of finance and RS7
Interpretation
This cluster will focus on the two variables such as Finance and car RS7 as the buying of
car involve the usage of finance (Gao, 2015). The current analysis will help an entity in order to
make positive decisions related to the existing business entity's performance. The finance will
not be provided to all the customers who intends to purchase RS7 as there are modes of finance
available such as bank loan and other financial institutions which can be uses for buying the car.
The crafting of strategy is essential by the top management of Audi in order to increases their
sales of different departments.
Modification of existing data
The efforts are applied by the researcher in order to complete the tasks and the duties
assigned by the management (Abdi, 2016). The current project aim of the business owner is to
19

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
accomplish their aim of handling the large set of data in order to facilitate various users. There
are different changes which needs to be make in order to present the information in a separate
structure. The company collect data from variety of sources in order to test its existing accuracy
(Manimekalai, 2016). The data are collected in order to facilitate all the users in accomplishing
their tasks and duties in the best possible manner.
CONCLUSION
It can be concluded from the above project report that an entity need to manage their data
in order to ensure its validity. The results of the above analysis are in the favor of the Audi which
shows that an entity are required to increase their sales. The analysis has resulted into the reasons
of reduction in the number of sales which helps an organization to decrease its overall impact.
20
Document Page
REFERENCES
Books and Journals
Rianse, U., and et.al., 2015. THE IMPACT OF THEGOLD MININGON THE SOCIAL,
ECONOMIC, ANDCULTURAL IN THEBOMBANADISTRICT SOUTHEAST
SULAWESI PROVINCE. International Journal of Sustainable Tropical Agricultural
Sciences. 1(1).
Read, J., Reutemann, P., Pfahringer, B. and Holmes, G., 2016. Meka: a multi-label/multi-target
extension to weka. Journal of Machine Learning Research. 17(21). pp.1-5.
Weka, R. P., Ikeh, E. I. and Kamani, J., 2013. Seroprevalence of antibodies (IgG) to Taenia
solium among pig rearers and associated risk factors in Jos metropolis, Nigeria. The
Journal of Infection in Developing Countries. 7(02). pp.067-072.
Saxena, R., 2015. Educational data Mining: Performance Evaluation of Decision Tree and
Clustering Techniques using WEKA Platform. International Journal of Computer Science
and Business Informatics.
Rao, B. T. and Reddy, B. S., 2014. Mining of High Dimensional Data using Efficient Feature
Subset Selection Clustering Algorithm (WEKA).International Journal of Computer
Applications. 107(6).
Kalmegh, S., 2015. Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and
RandomTree for Classification of Indian News. International Journal of Innovative
Science, Engineering & Technology. 2(2). pp.438-446.
Gao, L., 2015. Analysis of Employment Data Mining for University Student based on Weka
Platform. Journal of Applied Science and Engineering Innovation. 2(4). pp.130-133.
Youn, I. H., Won, K., Youn, J. H. and Scheffler, J., 2016. Wearable Sensor-Based Biometric
Gait Classification Algorithm Using WEKA. Journal of information and communication
convergence engineering. 14(1). pp.45-50.
Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F. and Leyton-Brown, K., 2016. Auto-WEKA
2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of
Machine Learning Research.17. pp.1-5.
Beckham, C., Hall, M. and Frank, E., 2016. WekaPyScript: Classification, regression, and filter
schemes for WEKA implemented in Python. Journal of Open Research Software. 4(1).
21
Document Page
Abdi, A., and et.al., 016. Affirmation of the Trade Performance between Islands as Shield of
Indonesia Confront MEA. International Journal of Sustainable Tropical Agricultural
Sciences. 2(1).
Mir, N. M., Khan, S., Butt, M. A. and Zaman, M., 2016. An Experimental Evaluation of
Bayesian Classifiers Applied to Intrusion Detection. Indian Journal of Science and
Technology. 9(12).
Seif, H., 2016. Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Perfomance
Evaluation. International Journal of Computer Science and Information Security. 14(1).
p.1.
John, E. T., Skaria, B. and Shajan, P. X., 2016. An Overview of Web Content Mining Tools.
Bonfring International Journal of Data Mining. 6(1), p.1.
Manimekalai, K., 2016. A Proficient Heart Disease Prediction Method Using Different Data
Mining Tools. International Journal of Engineering Science. 2676.
Online
Classification via decision trees in Weka, 2016. [PDF]. Available through<
http://facweb.cs.depaul.edu/mobasher/classes/ect584/weka/classify.html>. [Accessed on 1
January 2017].
22
1 out of 22
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]