Data Handling and Business Intelligence Analysis Report - Semester 1

Verified

Added on 2023/01/12

AI Summary

This report delves into the interconnected concepts of data handling, business intelligence, and data mining, crucial for modern organizations in collecting, storing, and analyzing data to derive actionable insights. It examines current trends in data warehousing, including self-service data access and the adoption of Big Data analytics. The report evaluates the use of Microsoft Excel for data pre-processing, analysis, and visualization, specifically focusing on a case study of a Superstore experiencing declining sales and profits. The analysis includes identifying missing values, formatting numerical data, and using pivot tables to extract relevant variables. Furthermore, the report explores the workings of WEKA, a data mining software, using a case study of an Audi dealership and employing k-means clustering to analyze customer behavior. It also explains common data mining methods such as tracking patterns and classification, providing real-world examples to illustrate their applications. Finally, it discusses the advantages and disadvantages of the WEKA tool.

Data Handling And Business
Intelligence

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Contents
Contents...........................................................................................................................................2
INTRODUCTION...........................................................................................................................1
MAIN BODY..................................................................................................................................1
PART 1............................................................................................................................................1
Current trends in data warehousing, business intelligence and data mining...............................1
Evaluating the use of Excel for pre-processing the data, analysing the data and visualising the
data...............................................................................................................................................2
PART 2............................................................................................................................................8
2.1 Workings of Weka.................................................................................................................8
2.2 Explain the most common methods of data mining which can be used by organization....11
2.3 Discusses the advantage or disadvantage of Weka tool......................................................12
CONCLUSION..............................................................................................................................13
REFERENCES..............................................................................................................................15

INTRODUCTION
Data handling involves the gathering, processing data, evaluating data and eventually
displaying data with graphs or diagrams. The creation of information comes to us quite naturally.
They send a number to anyone who asks them and use accordingly (Ahsan and Bais,
2018). Business Intelligence is a collection of methods, systems, and innovations that turn raw
data into usable information that motivate efficient business activities. It's a scripting language
that transforms data in knowledge and intelligence that is admissible. They cover the broadest
possible fields such as marketing, accounting, economics, technology, computer science,
anthropology, art history, medical science and biology where they can use data handling or
business intelligence techniques. This report based of data analysis of given information and
some common data mining tools which are used by organizations. In addition, discuss the
advantage or disadvantage of WEKA tool with support of database argument.
MAIN BODY
PART 1
Current trends in data warehousing, business intelligence and data mining
The concepts of data warehousing, business intelligence and data mining are related to
each other which helps modern business organisations to collect, store and classify data so that it
can be analysed in such a way that beneficial insights for business can be gained and suitable
decisions can be made. Data warehousing is a process which involves storing the large data in a
secured manner so that it can be used in future (Mohammed, Naugler and Far, 2015). For various
organisations, their data is their asset due to which it becomes even more important to protect
their data from theft. There are various current trend which are enhances the practice of data
warehousing. These trends are increased enablement of self service data access using cloud
services. This current trend provides the suitability to the data owner to access the data whenever
it is required from any digital electronic device. Other current trends for data warehousing are
growth of NoSQL Adoption and Big Data Analytics in the Cloud.
Business intelligence is the combinations of all the technologies which are used by
business organisations to collect, integrate, analyse and present their business information.
Among the various current trends in BI, the most influential is self service BI and analytics.
According to this technology, organisations which are of small scale can use business
1

intelligence tools and techniques by themselves and do not require external assistance. This trend
is focused to reduce the BI expenses of small scale organisation which does not own large data.
Other current trends in this field are big data innovation through social media which develop
customer analytics and text analytics which enables business to interpret Social Media Sentiment
(Mitrovic, 2020).
Data mining is the procedure in which an individual can mine the pre existing data by
using BI tools to generate hidden and analytic information from that data. Current trends in this
field are visual data mining, enhancement of a standardised language for data mining and
integration of data mining with data warehousing (Imhoff and White, 2011).
Evaluating the use of Excel for pre-processing the data, analysing the data and visualising the
data
Microsoft Excel is a software application which helps in recording and analysing
numerical information so that numerical data can provide information in a much more
understandable way. This software application is used for Superstore which is facing decline in
their sales and profits over the years.
Data pre processing
The data set of superstore is first pre processed using Excel. Data pre processing is a
technique which helps in cleaning the data so that it can be analysed without any error. The first
step which is taken to pre process the data is to find the missing values. By the shortcut key of
Shift + F4, all the missing values are first identified and then they are filled as the average of
their column. Secondly, in order to transform the data, all the numeric amount values are
formatted to 2 decimal points in order to bring familiarity. Then in last step of data pre
processing, the data set of superstore is reduced to a certain level using Pivot tables. In the pivot
table only those variables are selected which can impact the profit or sales of superstore.
Data analysis and visualisation
Once the data is pre processed, it is important to analyse it in such a way that it can fulfil
the aim of identifying the reasons due to which sales and profit of this company are decreasing
(Macaulay, Sekharan and Wang, 2017). For analysing, first all the numerical variables are
summed together according to their year using the Excel function of “=SUM()”. By using this
function sales, profit, discount, shipping cost, unit price and order quantity are summed on the
basis of 4 year 2009, 2010, 2011 and 2012. And then respective graphs are also developed.
2

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Sum of Profit
Sum of
Sales
Sum of
Discount
Sum of
Shipping
Cost
Sum of
Unit
Price
Sum of
Order
Quantity
2009 434096.02 4209896.85 105.39 28481.76 232831 54508
2010 364917.33 3560087.04 105.81 27354.26 162467.6 54379
2011 380310.5 3429944.98 101.67 24939.85 159653.1 51413
2012 342444.13 3715671.95 104.32 27055.17 195467.6 54480
3

From all the graphs above, it has been seen that sales of superstore has declined the most
in year 2011 and profit was at it worst in 2012. The reduction of sales revenue in 2011 occurred
due to due to low disocunt allowed in 2011, low shipping cost paid and less unit price charged
by ths superstore in 2011. All the four numerical variables analysed above has direct impact
upon the sales of superstore. In order maintain the sales revenue, Superstore must ensure that
they control their product unit price and avail sutaible disocunt so that their sales can be
increased (Bordeleau, Mosconi and Santa-Eulalia, 2018).
None of the above numeric variables has the direct impact upon profit due to which in
order to identify the variable which impacts profit, all non numerical variables are also analysed
below. Using the Excel function of countif, total number of categorical variables are identified.
For example, in order to identify how many times “Deelivery truck” shipment mode is used in
2009, the excel formula which used is “=COUNTIF('Superstore Sales1588154780137'!
H2:H2159,'Analysis and presentation'!C9)”.
Shipment mode
Delivery Truck Regular Air Express Air
2009 307 1582 269
2010 298 1597 246
2011 263 1460 275
2012 291 1609 202
5

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

From the above table and graph, it has been seen that in 2012, the shipment mode of
“Express Air” is minimum which is a valid resaon for decline in profit in 2012.
Region
North
Carolina Ontario
Northwest
Territories Atlantic West Prarie Quebec Yukon
2009 16 551 16 283 507 464 188 133
2010 18 575 16 295 483 412 200 142
2011 26 517 19 240 460 412 191 133
2012 19 518 8 262 541 418 202 134
6

From the analysis of “Region” variable, it has been seen that decline of sales in
Northwest Territories in 2012 can also be the reason for decline of profit in 2012.
Customer segment
Small Business Consumer Corporate Home Office
2009 416 443 790 509
2010 434 433 764 510
2011 389 396 718 495
2012 403 377 804 518
7

The variable of “Customer segment” is analysed above using a line graph and table, from
which it has been analysed that in year 2012, the sales percetnage to consumers reduced at its
minimum which is also a rsaon of reduced profit.
Product category
Office Supplies Technology Furniture
2009 1169 541 448
2010 1170 531 440
2011 1112 468 418
2012 1159 525 418
From the above tables and graphs, it has been analysed that the decline of profit of
superstore in year 2012 has occurred due to decline in usage of shipment mode “Express Air”,
reduction in sales to “Northwest Territories”, decline in sales to “consumer” Customer segment
and reduction in “furniture” sales in 2012.
PART 2
2.1 Workings of Weka
Weka is analytic software which helps in conducting statistical functions such as clustering
and descriptive analysis (Vera-Baquero, Colomo-Palacios and Molloy, 2013). For Audi
dealership, their data is analysed using Weka in which k means clustering is used with default 2
8

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

clusters. Clustering is the method of grouping the entire data in classes according to their
common features (Fuchs, Höpken and Lexhagen, 2014).
=== Run information ===
Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R first-last" -I
500 -S 10
Relation: audidealership2
Instances: 100
Attributes: 8
Dealership
Showroom
InternetSearch
RS7
A4
TT
Financing
Purchase
Test mode:evaluate on training data
=== Model and evaluation on training set ===
kMeans
======
Number of iterations: 6
Within cluster sum of squared errors: 160.2980769230769
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
9

Attribute Full Data 0 1
(100) (48) (52)
=================================================
Dealership 0.54 0.3333 0.7308
Showroom 0.64 0.6667 0.6154
InternetSearch 0.39 0.4375 0.3462
RS7 0.53 0.2917 0.75
A4 0.55 0.8125 0.3077
TT 0.5 0.5833 0.4231
Financing 0.6 0.3333 0.8462
Purchase 0.38 0.0417 0.6923
Time taken to build model (full training data) : 0.02 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 48 ( 48%)
1 52 ( 52%)
10