Data Handling and Business Intelligence

Verified

Added on  2023/01/11

|17
|3192
|53
AI Summary
This document provides an introduction to data mining and business intelligence methods used for analyzing and extracting information from large datasets. It focuses on the analysis of superstore data using Excel functions and the conjunction of audidealership data with Weka software. The document also discusses the advantages and disadvantages of Weka over Excel.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
DATA HANDLING AND
BUSINESS INTELLIGENCE

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
TABLE OF CONTENTS
INTRODUCTION...........................................................................................................................1
PART 1............................................................................................................................................1
By using data set of superstores analyse profit and sales over years and analyse it by using
Excel for pre- processing of data, also analyse and visualize the data........................................1
Demonstration of ways in which data can be practically analysed using Excel functions such
as Lookup, Pivot table, graphs and charts...................................................................................6
PART 2............................................................................................................................................9
2.1 By using audidealership.csv file show conjunction with Weka with the example of
clustering......................................................................................................................................9
2.2 explain data mining method that can be used in real business............................................11
2.3 Discuss advantage and disadvantage of weka over excel....................................................12
CONCLUSION..............................................................................................................................14
REREFENCES..............................................................................................................................15
Document Page
INTRODUCTION
Data mining is a kind of process which is used by organizations for data analysis, finding
pattern within large amount of data set and for various other kind of purposes. It has become one
of the most important part of an organization as it helps them in decision making, predict future
sales and for many other purposes (Unat and et. al., 2016). Data handing and business
intelligence methods are majorly used for analysing and extracting information from large
amount of data so that new data can be used for bringing changes within business strategies in
order to enhance revenue and profitability of organizations. Today it has become one of the most
important and effective part of an organization and is used in different ways in many industries
but mostly Business intelligence is used. Business intelligence is a kind of process, method or
technology that helps an organization to use, convert, analyse and extract meaningful data from
raw data which further helps an organization to expand their business and increase profitability.
For retail sector organizations it is extremely important for they need to develop strategies by
predicting their future sales on the basis of their current sales. This assignment will focus on
analysis of superstore data in order to identify and analyse sales and profit of the organization.
This assignment will also focus on conjunction in Weka Software of audidealership data with the
help of clustering and lastly advantages and disadvantage of Weka over Excel.
PART 1
By using data set of superstores analyse profit and sales over years and analyse it by using Excel
for pre- processing of data, also analyse and visualize the data
Data mining is majorly used for data handing and conducting business intelligence
operations. There are various kinds of methods that are used for data mining that comes under
data handing and business intelligence such as database systems, machine learning intersection
and statistics and many more. In order to transform data or extract useful information from it
there are many kinds of software’s or tools that can be used by organizations which helps
organizations to bring changes within their current strategies. Excel is one of those tools which is
based upon Data mining technique (Sajitha, Minimol and Mini, 2019). Mostly Excel is used by
organizations for financial calculation like calculation of their profit and sales, forecasting or
predicting future sales and for many other purposes. It is one of the most common data mining
tool which is used by organizations as it has various kinds of inbuilt functions that can be used
by organizations for data analysis. It can be used by organization to study many years past data
1
Document Page
and analyse their sales and profitability in many ways. Not only this, Excel provides various
kinds of options for creating a pivot table, creating graphs and many more. It also has inbuilt
formula to search a value in a large data set as well (Zelinka and et. al., 2018). Excel is majorly
used for organizing and arranging the data in such a manner that it can used for further analysis.
In order to analysis sales and overall profit of the provided data Excel can be used.
Below table will explain Average sales and Average profit of Superstore from 2009 to 2012
Row Labels
Average of
Sales
Average of
Profit
Furniture 3003.82282 68.11660673
2009 3287.21367 137.9565402
2010 2846.632764 21.35772727
2011 3035.062388 120.6278708
2012 2834.316565 -10.02715311
Office Supplies 814.0481779 112.3690738
2009 885.7139778 151.9643028
2010 778.0854274 100.9771282
2011 716.172473 78.20144784
2012 871.9747368 116.7143313
Technology 2897.941008 429.2075157
2009 3145.703293 359.7878373
2010 2631.278114 447.037081
2011 2916.463917 519.0770085
2012 2895.826493 402.5972762
Grand Total 1775.878179 181.1844243
Figure 1 Average sales and Average profit from 2009 year to 2012 year
Data interpretation: Average sales and average profit of super store can be calculated with the
help of pivot table and pivot graph. From the above graph it can be interpreted that average sales
2

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
of Office supplies were constantly lowest as compared to technology and furniture from 2009 to
2012 and Average profit was highest of technology in 2011. Not only this average profit was
constantly higher as compared to furniture and office supplies from 2009 to 2012.
Overall sum of profit and sum of sales from 2009 to 2012
Figure 2 Overall sum of profit and sum of sales
Data interpretation: By hanging value field setting from average to sum in pivot table sum of
profit and sum of sales can be calculated. From the above graph it has been analysed that overall
sum of sales of technology was highest in 2009, and lowest sum of sales were recorded in 2011
of office supplies.
Average profit and average sales of Furniture from 2009 to 2012
Row Labels Sum of Sales
Sum of
Profit
Furniture 5178590.542 117433.03
2009 1472671.724 61804.53
2010 1252518.416 9397.4
2011 1268656.078 50422.45
2012 1184744.324 -4191.35
Grand Total 5178590.542 117433.03
3
Document Page
Figure 3 Average profit and average sales of Furniture from 2009 to 2012
Data interpretation: By selecting only furniture from row label of pivot table above graph can
be generated. From the above graph it is clear that sales of furniture was highest in year 2009 and
overall profit from sales of furniture was also highest in year 2009.
Average profit and average sales of Office supplies from 2009 to 2012
Row Labels
Sum of
Sales
Sum of
Profit
Office Supplies 3752762.1 518021.43
2009 1035399.64 177646.27
2010 910359.95 118143.24
2011 796383.79 86960.01
2012 1010618.72 135271.91
Grand Total 3752762.1 518021.43
4
Document Page
Figure 4 Average profit and average sales of Office supplies from 2009 to 2012
Data interpretation: By selecting only Office Supplies from row label of pivot table above
graph can be generated. From the above graph it is clear that overall sales of Office supplies was
highest in year 2012 and overall profit of office supplies was highest in year 2009.
Average profit and average sales of Technology from 2009 to 2012
Row Labels Sum of Sales
Sum of
Profit
Technology 5984248.182 886313.52
2009 1701825.482 194645.22
2010 1397208.679 237376.69
2011 1364905.113 242928.04
2012 1520308.909 211363.57
Grand Total 5984248.182 886313.52
5

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Figure 5 Average profit and average sales of Technology from 2009 to 2012
Data interpretation: By selecting only Technology from row label of pivot table above graph
can be generated. From the above graph it is clear that technology sales were highest in year
2009. From the graph it cannot be determined that profit was highest in year 2010 or 2012 but
from the table it can be clearly analysed that overall profit was highest in 2011
Demonstration of ways in which data can be practically analysed using Excel functions such as
Lookup, Pivot table, graphs and charts
Lookup Function: Lookup is an inbuilt function present in Excel, analysing and selecting a data
or value from current large se of data (Kaler and Gupta, 2017). It is available in reference
function in Excel. It can also be used for selection of values from different range. It can be used
to return a value from a pre-defined set of values or array. Syntax used for lookup is: Lookup
(lookup_value, lookup_vector, [result_vector])
For example: In order to identify sales generated from order id: 56581 lookup function can be
used:
=LOOKUP(56581,B2:B8400,F2:F8400)
=1121.84
6
Document Page
Pivot Table: It is another kind of most commonly used Excel tool which is majorly used for
summarizing data in an appropriate manner (Koçak, Özyer and Alhajj, 2016). This function
helps in extracting required data in an appropriate method.
Calculate sum of sales and sum of profit region wise
Row Labels Sum of Sales
Sum of
Profit
Atlantic 2014248.204 238960.66
North Carolina 116376.4835 2841.11
Northwest Territories 83817.746 8307.05
Ontario 3780242.063 439214.57
Prarie 2837304.602 321160.12
Quebec 1510195.08 140426.65
West 3597549.276 297008.61
Yukon 975867.371 73849.21
Grand Total 14915600.82 1521767.98
Figure 6 sales and sum of profit region wise from 2009 to 2012
From the above graph it is clear that sales were highest in Ontario and West and if it is required
to be seen that in which year sales of Ontario and west was highest then filtration in row label
will be used.
Calculate sum of sales and sum of profit in Ontario and west
Row Labels Sum of Sales
Sum of
Profit
Ontario 3780242.063 439214.57
7
Document Page
2009 1073872.025 140766.88
2010 956311.9655 121434.93
2011 871108.2125 81471.4
2012 878949.86 95541.36
West 3597549.276 297008.61
2009 1045843.409 59188.71
2010 851790.4265 78920.24
2011 840788.285 103338.27
2012 859127.1555 55561.39
Grand Total 7377791.339 736223.18
Figure 7 sum of sales and sum of profit in Ontario and west from year 2009 to 2012
8

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
PART 2
2.1 By using audidealership.csv file show conjunction with Weka with the example of clustering
Figure 8 audidealership data relationship
Figure 9 Hierarchical clustering
9
Document Page
Figure 10 Cluster data using the k means algorithm
10
Document Page
2.2 explain data mining method that can be used in real business
In todays era of large data set it is necessary to extract data in effective way so that relevant and
precise info is obtained. Thus, data mining is process of using large info to generate useful info
from it. It enables in finding out trends and new knowledge from it and taking relevant decision.
In order to do so there are various types of data mining tools and techniques which is available. It
depends in nature and type of data that which technique is suitable ( Kumar and Britto, 2019).
However, each technique is having a different process of analysing data. They are defined as
below
Tracking pattern- it is a technique in which is commonly used to review change in pattern of
data. This usually involves recognizing data of same abbreviations that may occur at regular
11

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
intervals or with change in flow of variables in particular time peri. Here, an example can be
taken of analysing sales of product on website.
Classification- it is a complex data mining technique in which data is classified is into several
factors and variable. This allows in drawing up further info from it. For example if data is to be
analyzed into customer income then it would be high, low and medium. The technique is used to
gather relevant info by classifying things.
Association- As name depicts it is a technique where data is analysed to view pattern. Here,
relationship is identified between variable and it makes it easy to show change in pattern. There
is high correlation between variable or pattern. For example, if customer buys specific item in
relation to other item.
Outer detection- this is a technique in which data analysis is done to fimd out any external factor
or anomalies within data. With that it becomes easy to find out why auch anomalies occurred.
This enables in better and effective understanding of audience.
Clustering- it is similar to classification but in this there cluster are made on basis of similar
characteristics or features and then data is extracted. There is change in things of how
Regression- basically it is used in planning and modelling but with this it enables in finding out
likelihood of variables due to existence of other. It shows that how any change in one factor can
impact of other. So, it is used in data set to find out relationship between two variables.
Prediction- this technique is used to predict future data. The use of this technique provide highly
useful and valuable data which can be used in future. Thus, here only by analysis historical data
and pattern, accurate prediction can be made into future. For instance, prediction is to be done of
sales in next year.
Thus, these are all data mining technique which can be used by business. This allows in
extraction of data and providing useful info Alongside, the info can be used to make strategy and
plans and also in taking decision. Moreover, technique make it easier to gain useful info there is
need to use proper and effective technique so that data is analysed precise and accurate. The
change in technique within real business is simple to do.
2.3 Discuss advantage and disadvantage of weka over excel
There are many types of technique available which is used in data mining. Apart from those, it is
required a software to run and implement those technique. Each technique works on different
platforms and contains many features. Along with it, connection between technique depends on
12
Document Page
what machine learning language is used by them. Here, a comparison is done between weka and
excel. There pros and cons are as follows
Weka it is an open source software which uses Java as data mining tool. The software can be
run on Windows, IOS and Linux.
Advantage
Weka is known for its extensive machine learning capabilities as it supports all data mining
technique that are cluster, classification, etc
The access is granted with help of GUI
It can be connected with SQL
The technique uses decision tree, ID3, etc
It uses simple command line to handle large data set
Disadvantages
Only cluster analysis can be used in this
There may occur error in processing when there is large amount of data to be processed.
The memory of tool is less.
No connectivity with excel and other non Java based data base
Does not have the facility to save parameters for scaling to apply to future datasets.
Excel It is a spreadsheet software offered by Microsoft Windows. The excel is highly flexible
and easy to use.
Advantage
It can be used for analysis as well as decision making.
The data can be viewed in easier way and on by sending to e mail or printing it.
There is no need to purchase or install excel in Windows.
The files in it are password protected which helps in maintaining of data
Th comparison can be easily done and pattern and trends is analysed properly
The data can be sorted and pivots table can be made
Disadvantage
It does not work on IOS, Linux and on other platforms.
Only creating one file may result in slow working of data.
The graph offered by excel is not of high quality.
It can not be connected with any other databases.
13
Document Page
So, it can be evaluated that there are many new and advanced data mining technique which is
available. It has enabled in providing many features and running on various machine language.
Besides, weka can be connected with other databases but excel can not. Usually, it is necessary
to find out proper software which consists of various features. There are to be change in things.
Other than this, in terms of flexibility excel is easier to use as compared to weka.
CONCLUSION
From the above assignment it has been summarized that data mining is one of the most
important process which is required by the organizations to be focused on for prediction of
accurate results. Data mining helps in summarizing, analysing data so that on the basis of which
future strategies can be prepared. It can be used by retail sector organizations for prediction of
sales, profit so that organization can enhance their future sales and profit as well as their
relationship with their customers.
14

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
REREFENCES
Books and Journals
Bhatia, P., 2019. Data Mining and Data Warehousing: Principles and Practical Techniques.
Cambridge University Press.
Eldén, L., 2019. Matrix methods in data mining and pattern recognition (Vol. 15). Siam.
Kaler, S. and Gupta, R., 2017, August. The design of intelligent washing machine controller
based on FIS & ANFIS. In 2017 International Conference on Information,
Communication, Instrumentation and Control (ICICIC) (pp. 1-6). IEEE.
ki, L., Jaworski, M. and Duda, P., 2020. Basic Concepts of Data Stream Mining. In Stream Data
Mining: Algorithms and Their Probabilistic Properties (pp. 13-33). Springer, Cham.
Koçak, Y., Özyer, T. and Alhajj, R., 2016. Utilizing maximal frequent itemsets and social
network analysis for HIV data analysis. Journal of cheminformatics. 8(1). p.71.
Kumar, A.V.R. and Britto, A., 2019. Analytical Models for the Estimation of Effect of the Spray
Angle on a Subcooled Target. International Journal of Psychosocial
Rehabilitation. 23(4).
Roiger, R.J., 2017. Data mining: a tutorial-based primer. CRC press.
Sajitha, S., Minimol, B. and Mini, M.G., 2019, September. A Computational Model of
Autonomic Nervous System for Heart Rate Variability. In 2019 Computing in
Cardiology (CinC) (pp. Page-1). IEEE.
Unat, D., and et. al., 2016, June. Tida: High-level programming abstractions for data locality
management. In International Conference on High Performance Computing (pp. 116-
135). Springer, Cham.
Zelinka, I., and et. al., 2018. Soft Computing Systems: Second International Conference, ICSCS
2018, Kollam, India, April 19–20, 2018, Revised Selected Papers (Vol. 837). Springer.
15
1 out of 17
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]