Data Handling and Business Intelligence

Verified

Added on 2023/01/11

AI Summary

This report discusses the use of Excel for data analysis and visualization, focusing on the decline in sales and profits over the years. It also explores the workings of WEKA using audidealership.csv and analyzes various data mining methods used in business. The advantages and disadvantages of WEKA over Excel are also discussed.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Data handling and business
intelligence

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Contents
INTRODUCTION...........................................................................................................................1
PART 1............................................................................................................................................1
Evaluating the use of Excel for pre-processing, analysing and visualising the data using
Superstore data and determining the decline in sales/profits over the years...............................1
PART 2............................................................................................................................................8
2.1 Workings of WEKA using audidealership.csv......................................................................8
2.2 Explaining data mining methods that can be used in business with real world examples. .11
2.3 Discussing the advantages and disadvantages of WEKA over Excel.................................13
CONCLUSION..............................................................................................................................14
REFERENCES..............................................................................................................................15

INTRODUCTION
Data handling is a procedure of recording, analysing, interpreting and evaluating the data
set. The concept of data handling is related with business intelligence which helps in utilising
the tools of data handling in operations of a business organisation (Bordeleau, Mosconi and
Santa-Eulalia, 2018). The main aim of this report is to build an understanding about how data
systems work and how their benefits can be optimised. This report is divided into two sections.
In first section of this report, current trends in data warehousing, business intelligence and data
mining are discussed. Along with which, a data set of superstore is used in order to practically
use the Microsoft Excel functions. These functions will be used to determine the decline in sales
and profit of superstore.
In the second section of this report, data of audidealership is used for clustering the data
using an analytic application of WEKA. In this section, various data mining methods are also
analysed along with their associated real life examples. Finally advantages and disadvantages of
WEKA application are discussed over Microsoft Excel.
PART 1
Evaluating the use of Excel for pre-processing, analysing and visualising the data using
Superstore data and determining the decline in sales/profits over the years
Data warehousing is the practice of storing the data so that it can be used whenever it is
required. This concept is used by various business organisations to store and organise their
business information so that it can protected. The current trends for data warehousing in market
are optimisation and performance by creating a balance between disk storage and memory.
Another trend in this field is In-memory database management system. This trend is result of
bridging the issue of slow query response system (Mitrovic, 2020).
Business intelligence is a system which allows its users to use tools and techniques by
which simple data can provide meaningful insights and information which can be used for
business operations. Current trends for business intelligence in market are open source BI and
software as a service. By using open source BI, small scale organisations can use tools of BI
without any license acquisition and free of costs. On the other hand, SaaS is the result of
increasing demand of software for forecasting and predictions.
1

Data mining is the procedure of mining the pre stored data by running algorithms and other
functions to mine the additional information from the data which is not visible by only looking at
the data set. The current trends which are observed in the field of data mining are Visualisation
and service oriented architecture (Fuchs, Höpken and Lexhagen, 2014). Visualisation is the trend
which is being highly used by business organisation as it allows presenting the mined
information by using graphs, charts and tables. On the other hand, service oriented architecture
helps in mining only that data which can result in providing useful insights to the company.
There are various software applications which are used to conduct above procedures of data
warehousing, business intelligence and data mining. One of those applications is Microsoft
Excel, using this application; its usage for pre processing, analysing and virtualising the data is
evaluated.
Pre processing the data
The data in superstore dataset is raw and is required to be pre processed before analysing.
Excel is analytic software which has various options to pre process the data. Among all those
tools and functions, the function of identification of missing values, PIVOT table, FILTER and
SORT are used. By using the shortcut key of Shift + F4, all the missing values (product base
margin) in dataset are identified and then filled by their mean values. Once the dataset is
cleansed, it is then transformed by using sorting and filters option so that all values can be shown
according to their order date. Lastly, the dataset is reduced to only few values using PIVOT
table. The selection of these variables is done on the discretion that only those variables will be
selected which can impact organisation’s profit and sales.
Analysing the visualisation
Once the dataset is processed, it is then essential to analyse and visualise it so that
meaningful insights can be gained. This process of analysis is done using the Excel functions of
SUM(), LOOKUP(), COUNTIF() and CORRELATION. And the process of visualisation is done
by Bar chart and line graph (Ataman, Kulick and Sim, 2011).
First all the numerical values are selected from the PIVOT table and using the function of
sum, total of all numerical values are divided according to four years (2009, 2010, 2011 and
2012). A visual representation of these total values is given in below table. Using the data of the
table, graphs for each variable is also prepared.
2

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Sum of Profit Sum of Sales Sum of Discount
Sum of
Shipping Cost
Sum of Unit
Price
Sum of Order
Quantity
Sum of
product
base price
2009 434096.02 4209896.85 105.39 28481.76 232830.98 54508 1113.69
2010 364917.33 3560087.04 105.81 27354.26 162467.59 54379 1099.9
2011 380310.5 3429944.98 101.67 24939.85 159653.11 51413 1019.93
2012 342444.13 3715671.95 104.32 27055.17 195467.55 54480 1071.08
3

From the above analysis and visuaisation, it has been interpreted that each variable which
is analysed above is the resaon behind the decline of sales as all the variabes are showing the
similar trend that is minimum value in 2011 year. Now the most impactful variable upon sales is
determine using the Excel function of CORRELATION an dresults gained from this function are
given below:
4

Sum of Sales
Sum of
Discount
Sum of Shipping
Cost
Sum of Unit
Price
Sum of
Order
Quantity
Sum of
product base
price
Sum of Sales 1
Sum of Discount 0.554020074 1
Sum of Shipping
Cost 0.837509053 0.91691319 1
Sum of Unit Price 0.976545347 0.4445884 0.764093544 1
Sum of Order
Quantity 0.607858343 0.93397643 0.916714438 0.57228736 1
Sum of product
base price 0.728272194 0.96997286 0.976006533 0.61764576 0.90322642 1
This is clear from above results that unit price is the main reason behind the decline in
sales. Superstore has reduced the unit price of thei products due to which sales revenue of this
organisation has been reduced.
The above analysis has only provided the reason behind decline in sales. In order to
identify the cause behind decline of profit, non numeric variables are used by using LOOKUP
and COUNTIF and the results are attached below:
Delivery Truck Regular Air Express Air
2009 307 1582 269
2010 298 1597 246
2011 263 1460 275
2012 291 1609 202
Shipment mode
North Carolina Ontario Northwest Territories Atlantic West Prarie Quebec Yukon
2009 16 551 16 283 507 464 188 133
2010 18 575 16 295 483 412 200 142
2011 26 517 19 240 460 412 191 133
2012 19 518 8 262 541 418 202 134
Region
5

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Small Business Consumer Corporate Home Office
2009 416 443 790 509
2010 434 433 764 510
2011 389 396 718 495
2012 403 377 804 518
Customer segment
2009 2010 2011 2012
0
100
200
300
400
500
600
700
800
900
Small Business
Consumer
Corporate
Home Office
Office Supplies Technology Furniture
2009 1169 541 448
2010 1170 531 440
2011 1112 468 418
2012 1159 525 418
Product category
6

2009 2010 2011 2012
0
200
400
600
800
1000
1200
1400
Office Supplies
Technology
Furniture
Sum of
Profit Express Air
Northwest
Territories Consumer Furniture
Sum of Profit 1
Express Air 0.74897056 1
Northwest
Territories 0.55005312 0.95812629 1
Consumer 0.73593026 0.580260641 0.53173167 1
Furniture 0.70546329 0.383957397 0.29899993 0.96525964 1
From the above analysis, four reasons behind the decline of profit are identified which are
decline in the shipments using Express air, reduction of sales in Northwest territory, reduction in
sales to consumer and reduction of sales of furniture. The most influential variable among these
four is reduction in shipment through Express air and this result is gained from the correlation co
efficient.
Determining decline in sales and profit
The reason behind the reduction of sales is declining unit price of the superstore’s products
and the reason behind low profit is the reduction in usage of “Express air” as a shipment mode.
PART 2
2.1 Workings of WEKA using audidealership.csv
WEKA is analytic application which helps its users in the process of data mining. There
are various functions which can be conducted using WEKA and one of them is clustering
(Yuchen, 2016). For the given data set of audidealership, the method of clustering which is used
is K means clustering which requires to pre determine the numbers of clusters and as the data has
two data points such as 0 and 1, two clusters are determined. The results of WEKA are shown
below:
7

=== Run information ===
Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R first-last" -I
500 -S 10
Relation: audidealership2
Instances: 100
Attributes: 8
Dealership
Showroom
InternetSearch
RS7
A4
TT
Financing
Purchase
Test mode:evaluate on training data
=== Model and evaluation on training set ===
kMeans
======
Number of iterations: 6
Within cluster sum of squared errors: 160.2980769230769
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
8

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

(100) (48) (52)
=================================================
Dealership 0.54 0.3333 0.7308
Showroom 0.64 0.6667 0.6154
InternetSearch 0.39 0.4375 0.3462
RS7 0.53 0.2917 0.75
A4 0.55 0.8125 0.3077
TT 0.5 0.5833 0.4231
Financing 0.6 0.3333 0.8462
Purchase 0.38 0.0417 0.6923
Time taken to build model (full training data) : 0.02 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 48 ( 48%)
1 52 ( 52%)
9

From the above results, it is clear that number of instances of 0 cluster is 48% and
number of 1 cluster is 52%; this implies that more than half person completes the steps of
walking inside the showrrom, looking over three versions of cars, undersatnding financing
options and buy a car. From the above results, few patterns are identified on of which is the
model A4 is most looked over car of the showroom which must be considered as perefernce of
consumers. Another identfied trend is that out of 100 people 60 understand their finnacing
options but only 38 really buy the car. This trend provides the opportunity to effectively enahnce
the financing options so that sales of this delearship can grow.
2.2 Explaining data mining methods that can be used in business with real world examples
As analysed in above sections data mining is the procedure by which a user can mine the
data to get more useful insights from the information. There are various methods by which this
method is conducted in business organisation. An analysis of few of these methods along with
their application with real life examples are explained as follows:
10

Tracking sequential patterns – This method is a technique which is used by focusing on
identifying the series of events which happen in data set at regular intervals. These regular
intervals are the sequences which help in mining the transactional data. This type of method is
considered as ideal for large organisation or for organisation’s having big data. A real instance in
which this method can be used is retail organisations. These organisations track sales of their
every product to identify that in which month they are sold at maximum level and then they
develop their strategies accordingly (Llave, 2018).
Decision trees – This method is a machine learning technique which helps its users to mine
data by identifying how data inputs affects data outputs. Decision tree acts as a model in which a
tree like structure is developed with set of braches. This model is structured in a such a way that
each input unit presents its affected output units. There are various issues with this model due to
prediction and forecasting of variables using this technique is complex and there is high rate of
inaccuracy. A real life practical example of this model is research department of a travel
company in which they predict their demand by analysing the impact of their offers upon their
sales.
Clustering – This is an analytics technique in which entire data set is divided into groups
which are considered as clusters. These clusters are represented using visual graphs in which
distribution of data is represented. The main of this technique is to identify the trends in each
cluster. A real life example of this technique is automobile showrooms in which activities of all
their customers are tracked and then divided into clusters so that their preference and buying
patterns can be considered (Shollo and Galliers, 2016).
Correlation – This is one of the most used techniques of data mining in which relationship
between two variables are analysed along with the nature of their relationship. A practical
example of this technique is investigator who analyse the impact of various factors on the gross
domestic product of a nation. In this example, all the factors are correlated with GDP and then
most impacted factor with high correlation is identified (Vagh, 2012).
All the techniques which are explained above are used in real life practices. Some of these
techniques such as clustering and tracking patterns are even used in audidealership data.
11

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

2.3 Discussing the advantages and disadvantages of WEKA over Excel
WEKA or Waikato Environment for Knowledge Analysis is analytical software used for
data mining. This is free software which does not require any license and is a tool of practical
machine learning. Pros and cons of WEKA over Microsoft Excel are discussed as follows:
Pros:
Compatibility – WEKA is software which runs on Java due to which it is compatible with
all the operating systems which supports Java such as Mac, Windows, Linus and more. On the
other hand, Microsoft Excel is only compatible with Windows operating system. This implies
that the dataset of audidealership could be analysed using any platform but for superstore data,
only Windows usage is required.
Open source – Another positive point of WEKA is that it is open source software and uses
General Public License. This means this application can be used free of cost for any non
specified purpose. On the contrary, in order to use Microsoft Excel, it is important for user to
acquire license of entire Microsoft package first (Zubi and Mahmmud, 2013).
Easier to use – As the functions and tools which WEKA application offer are limited, it is
considered easier to use. On the other hand, to use advance functions of Excel requires specific
skills and abilities. This point is even evident with the practical usage of Excel and WEKA
which has been done to analyse two datasets. The usage of WEKA for the purpose of clustering
was easier from usage of Excel functions such as IF, LOOKUP and PIVOT table.
Cons:
Like every software, WEKA also has few cons which make Excel more suitable data mining
software.
Memory sensitive – As it has been discussed above, WEKA runs on Java. This feature of
this application makes it memory sensitive due to which it can store information to an extend
only. On the other hand, Microsoft Excel can store big data of large companies due to which it is
considered ideal for large data analytics.
Lack of documentation – WEKA is open source software and lacks in proper documentation
due to which large organisations do not trust upon this software and prefer to use applications
with a trustworthy documentation. On the other hand, every software application of Excel comes
with a license and documents which gains the trust of business organisations which wants to
keep their data secure (Phanse and Deorah, 2011).
12

From the above discussion, it has been seen that Microsoft Excel is a software application
which is appropriate for big data due to which it was used for superstore data having 8399
queries. On the other hand, WEKA is a quicker and easier application to use which is appropriate
for small data due to which this application was used for audidealership data having just 100
queries.
CONCLUSION
From the above analysis, it has been concluded that data handling and business intelligence
are the platforms using which business organisations use their data for the purposes of
predictions and decision making. It has also analysed that every trend in data warehousing, BI
and data mining is the result of current issues which organisations are faced and these trends tend
to resolve those issues. From the dataset of superstore it has been concluded that decline in sales
was the result of decline in “unit price” and decline in profit was the result of “shipment mode”.
The dataset of audidealership summarises that this organisation must focus upon promoting their
A4 version of car.
13

REFERENCES
Books and Journals
Shollo, A. and Galliers, R.D., 2016. Towards an understanding of the role of business
intelligence systems in organisational knowing. Information Systems Journal. 26(4).
pp.339-367.
Bordeleau, F. E., Mosconi, E. and Santa-Eulalia, L. A., 2018, January. Business Intelligence in
Industry 4.0: State of the art and research opportunities. In Proceedings of the 51st
Hawaii International Conference on System Sciences.
Mitrovic, S., 2020. Adapting of international practices of using business-intelligence to the
economic analysis in Russia. In Digital Transformation of the Economy: Challenges,
Trends and New Opportunities (pp. 129-139). Springer, Cham.
Fuchs, M., Höpken, W. and Lexhagen, M., 2014. Big data analytics for knowledge generation in
tourism destinations–A case from Sweden. Journal of Destination Marketing &
Management. 3(4). pp.198-209.
Llave, M. R., 2018. Data lakes in business intelligence: reporting from the trenches. Procedia
computer science. 138. pp.516-524.
Yuchen, W.U., 2016. The Data Mining Application Based on WEKA: Geographical Original of
Music. Management Science and Engineering. 10(4). pp.47-57.
Vagh, Y., 2012. The application of a visual data mining framework to determine soil, climate
and land use relationships. Procedia Engineering. 32. pp.299-306.
Zubi, Z.S. and Mahmmud, A.A., 2013. Using data mining techniques to analyze crime patterns
in the libyan national crime data. In Proceedings of the 1st WSEAS Interna-tional
Conference on Image Processing and Pattern Recognition (pp. 79-85).
Phanse, V. and Deorah, S., 2011, December. Evaluation and Extension to the Duckworth Lewis
Method: A Dual Application of Data Mining Techniques. In 2011 IEEE 11th
International Conference on Data Mining Workshops (pp. 763-770). IEEE.
Ataman, K., Kulick, G. and Sim, T., 2011. Teaching decision tree classification using Microsoft
Excel. INFORMS Transactions on Education. 11(3). pp.123-131.
14