Business Intelligence Report: Excel, Weka, and Data Mining Analysis

Verified

Added on 2023/01/11

AI Summary

This report explores data handling and business intelligence, focusing on the analysis of sales and profit decline using Microsoft Excel and Weka. The first part utilizes Excel to preprocess, analyze, and visualize Superstore data, identifying factors contributing to sales and profit trends. The analysis reveals declining sales until 2011, with a subsequent rise in 2012, attributed to changes in discounts, shipping costs, unit prices, and order quantities. Profit decline is linked to the reduced use of Express Air shipment mode. The second part employs Weka for clustering the "audidealership" dataset using the k-means method, providing insights into customer behavior. The report also discusses common data mining methods like tracking patterns and classification that can be used in business, along with the advantages and disadvantages of Weka over Excel, offering a comprehensive overview of data analysis techniques.

DATA HANDLING AND
BUSINESS INTELLIGENCE

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Contents
INTRODUCTION...........................................................................................................................1
Determining the decline in sales/profits over the years, and evaluating the use of Excel for pre-
processing the data, analysing the data and visualising the data.................................................1
PART 2............................................................................................................................................7
2.1 Weka workings......................................................................................................................7
2.2 Explaining the most common data mining methods that can be used in business..............10
2.3 Discussing the advantages/disadvantages of Weka over Excel...........................................12
CONCLUSION..............................................................................................................................13
REFERENCES..............................................................................................................................14

INTRODUCTION
Data handling is the procedure of storing and securing the data which is collected through
research (Beyer, 2019). This process is based upon the concept of business intelligence for which
data acts as an asset. The term business intelligence refers to the technologies which helps a
business to effectively operate and attain competitive advantage. The main aim of this report is to
build an understanding regarding the data warehousing and the tools by which data can be
handled and mined.
This report is divided into two parts. In the first part, the software application of Microsoft
Excel is used to pre process, analyse and visualise the data using Superstore data along with
current trends in data warehousing, business intelligence and data mining are also analysed. In
the second part of this report, the software application of Weka is used to present the clustering
using “audidealership” data. In this part, most common data mining methods are also analysed
which a business organisation can use in their operations. Along with which, benefits and
limitations of Weka over Excel are also discussed.
PART 1
Determining the decline in sales/profits over the years, and evaluating the use of Excel for pre-
processing the data, analysing the data and visualising the data
Microsoft excel is a software programmed which allows users to develop spread sheets and
then analyse it with various tools such as data analysis and graphical tools (Cao, Ewing and
Thompson, 2012). This software application is used for the data of Superstore. This data set has
21 variables including Row ID, Order Date, Order Priority, Order Quantity, Sales, Discount
, Ship Mode and many more. This data has information of total 8399 orders. By using
Excel formulas and tools, this data of Superstore is pre processed, analysed and visualised.
Pre processing the data:
The procedure of pre processing the data is quite complex and has various techniques to do
it. A standard procedure of pre processing of data involves five stages which are data cleaning,
data integration, data transformation, data reduction and lastly data discretisation (Jolliffe and
Stephenson, 2012). For this process, the tool of Microsoft Excel which is used is Pivot table.
Pivot table is a tool of summarising, classifying and processing the data so that it can be
further used for evaluation and analysis. Using this tool, the data set of Superstore is pre
1

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

processed. First the data is being checked to identify any missing values from the data set. Once,
it has been identified that the data set has no missing values. The data set is reduced to 7
variables which are year, sum of profit, sum of sales, sum of discount, sum of shipping cost, sum
of unit price, Order Priority, Ship Mode and sum of order quantity. The reason behind reducing
the data was to transform the data according to the years. The pivot table is developed in a new
worksheet of superstore dataset.
Analysing and visualising the data:
Once the data is pre processed, then it can be analysed in order to fulfil the aim (Landtblom,
2018). The main aim of conducting this Excel analysis was to determine the reason behind
decline of sales and profit. In order to analyse the data, yearly information regarding 6 variables
is gathered together and presented in a table. This table is presented in 3rd worksheet titled as
“ANALYSIS and VISUALISATION”. A similar table is presented as below:
Sum of
Profit Sum of Sales
Sum of
Discount
Sum of
Shipping
Cost
Sum of
Unit Price
Sum of
Order
Quantity
2009 434096.02 4209896.846 105.39 28481.76 232830.98 54508
2010 364917.33 3560087.045 105.81 27354.26 162467.59 54379
2011 380310.5 3429944.981 101.67 24939.85 159653.11 51413
2012 342444.13 3715671.953 104.32 27055.17 195467.55 54480
Using this data, 5 graphs are also developed which are represented below:
2

From the above data which is visualised using tables and graphs, it has been seen that
sales of the superstore are continuously declining till 2011 but in 2012, the sales of this
organisation rises. In order to analyse the reason behind this pattern, graphs for other four
4

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

variables are also developed. A similar pattern was identified in all these graphs that along with
sales, sum of discount, sum of shipping cost, sum of unit price and sum of order quantity are also
declining till 2011 but in 2012 the values of this variables rises. In order to evident this pattern, a
trend line for each graph is also developed. So, by this analysis it was clear that the selected four
numeric variables were the reason behind declining and rise of sales in 2010, 2011 and 2012
respectively.
Unlike sales, profit shown a separate pattern in which profit was declining in year 2010
but for a year of 2011 it hiked and then again started to decline in 2012. This unusual pattern was
not observed from any of the four numeric variables. In order to analyse the reason of profit
decline, another variable of order priority was considered. The data for this variable is presented
below in a table and graph.
Order Priority
2009 2010 2011 2012
Low 141 34 22 32
Medium 164 42 37 46
High 37 43 8 23
Critical 23 43 59 68
Not Specified 18 21 102 1
5

The above table is prepared using excel function LOOKUP. Using this function, it has
been identified that how many orders has which order priortiy (Sarkar and Rashid, 2016). For
eaxmple, for low priority in 20019, the formula which as used is “=LOOKUP('PIVOT TABLE'!
A9,'PIVOT TABLE'!A4:G1324)”. From the above analysis of variable “Order prioroty”, no
similar pattern as profit was identified. This process was repeated once again with the variable of
“Ship mode” and that is presented in a table and graph below:
Ship Mode
2009 2010 2011 2012
Regular
Air 38 124 50 42
Express
Air 29 24 73 42
Delivery
Truck 46 73 62 62
The above table is also developed using LOOKUP function. This function is used to
determine, that in a speciic year, a specific shipment mode was used how many times (Leech,
Barrett and Morgan, 2013). For example, for year 2009, the regular air, the formula which was
used is “=LOOKUP('PIVOT TABLE'!A6,'PIVOT TABLE'!A4:G1095)”. From the above
6

analysis of variable “ship mode”, a similar pattern to profit was identified in Express Air
shipment mode. This implies like profit the usgae of Express air shipment mode was reduced in
2010, hicked in 2011 and then continous to decline in 2012. This leads to the analysis that due to
variation in shipment of mode of Express Air, the profit of superstore was declining.
Reasons behind decline in sales/profits over the years:
After the pre processing, analysing and visualising the data, certain conclusions are
developed which has helped in determining the reasons behind the sales and profit of superstore
over the year. For decline in sales, for reasons are identified which are decline in allowed
discount to customers, decline in shipping cost due to reduced number of shipped products,
decline in unit price at which each product is sold and decline in overall order quantity of
product over the years.
For decline in profit, only one reason has identified which is decline in the usage of
shipment mode of Express Air. This type of shipment mode is most effective and people rely
upon this mode and due to reduction in the usage of this mode by super store the profit of this
company started to decline.
PART 2
2.1 Weka workings
Weka (Waikato Environment for Knowledge Analysis) is statistical software used for
running algorithms. This software is majorly used for the process of data mining. This software
includes certain tools by which a user can pre process the data, classy, cluster and visualise the
data (Gulia, 2020). Using the Weka application, the data set of audidealership is used to perform
data mining using clustering. Data mining is a procedure of examining a data so that new and
advanced information can be generated from that data. Clustering is a method by which an
investigator can divide the large data set into groups according to their similar attributes.
Weka provides various options of clustering from which k means clustering is used. This
method allows divide the data into non overlapping sub groups. This method involves integration
and classifying. The data set of “audidealership” is clustered and the result which is gained is
presented below along with visualization graphs.
=== Run information ===
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R first-last" -I
500 -S 10
Relation: audidealership2
Instances: 100
Attributes: 8
Dealership
Showroom
InternetSearch
RS7
A4
TT
Financing
Purchase
Test mode:evaluate on training data
=== Model and evaluation on training set ===
kMeans
======
Number of iterations: 6
Within cluster sum of squared errors: 160.2980769230769
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(100) (48) (52)
8

=================================================
Dealership 0.54 0.3333 0.7308
Showroom 0.64 0.6667 0.6154
InternetSearch 0.39 0.4375 0.3462
RS7 0.53 0.2917 0.75
A4 0.55 0.8125 0.3077
TT 0.5 0.5833 0.4231
Financing 0.6 0.3333 0.8462
Purchase 0.38 0.0417 0.6923
Time taken to build model (full training data) : 0.02 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 48 ( 48%)
1 52 ( 52%)
9