Data Handling and Business Intelligence: Superstore and WEKA Analysis

Verified

Added on 2023/01/11

AI Summary

This report provides a comprehensive analysis of data handling and business intelligence, focusing on the application of data mining techniques to a superstore dataset. The report begins with an introduction to data mining and business intelligence, emphasizing their importance in modern organizations. Part 1 delves into the analysis of superstore data using Excel, demonstrating the practical use of functions like Lookup and Pivot tables to analyze profit and sales trends over several years. Data visualization through graphs and charts is also demonstrated. Part 2 shifts to the use of WEKA, showcasing clustering analysis using an audidealership.csv file. The report explains commonly used data mining methods such as association, classification, and clustering analysis with real-time examples. It concludes with a discussion of the advantages and disadvantages of WEKA. Overall, the report offers a practical guide to data analysis and business intelligence, making it a valuable resource for students on Desklib.

DATA HANDLING AND
BUSINESS INTELLIGENCE

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

TABLE OF CONTENTS
INTRODUCTION...........................................................................................................................1
PART 1............................................................................................................................................1
By using data set of superstores analyse profit and sales over years and analyse it by using
Excel for pre- processing of data, also analyse and visualize the data........................................1
Demonstration of ways in which data can be practically analysed using Excel functions such
as Lookup, Pivot table, graphs and charts...................................................................................4
PART 2............................................................................................................................................8
By using audidealership.csv file show conjunction with Weka with the example of clustering.8
Explanation of commonly used data mining methods that can be used in business. Explain
them with real time example......................................................................................................11
Advantages and disadvantages of Weka....................................................................................14
CONCLUSION..............................................................................................................................15
REREFENCES..............................................................................................................................16

INTRODUCTION
Data Mining can be defined as a process through which pattern within large data sets are
discovered. It involves various kinds of methods at machine learning intersection, database
systems and statistics (Homocianu and Airinei, 2017). It can also be defined as a practise of
examining large number of pre- existing databases so that new information can be generated.
Today many organizations focus on data mining for data handing and extraction of new and
important data. For this business uses business Intelligence so that specific data can be identified
that can further be used for taking effective decisions. Business Intelligence can be defined as a
set of process, technologies, architecture that helps in converting raw data into meaningful data
which is fruitful in driving overall profitability of business. There are various kinds of software’s
that can be used by organizations that can be used for BI and transform data into actionable
knowledge and intelligence. It is one of the most important for retail sector organizations as it
helps them to analyse large volume of data and take appropriate and effective decisions so that
they can enhance their relationship with their customers and increase their overall profitability.
This assignment will lay emphasis on Analysis of superstore and audileadership data so that the
given data can be analysed and overall sales and profit of the organization can be identified other
than this different kinds of data mining methods will be explained with advantages and
disadvantage of Weka software will be discussed in this assignment.
PART 1
By using data set of superstores analyse profit and sales over years and analyse it by using Excel
for pre- processing of data, also analyse and visualize the data
There are various kinds of techniques, methods and formulas in Excel that can be used
for analysing and calculating profit and sales over years. It helps in evaluation of data so that
organizations can get a brief idea of their average profit and overall sales in last 4 to 5 years so
that this data can be used for further important decision making (Moro and et. al., 2020). Excel is
one of the most common software that can be used by organizations for calculations and analysis
of financial calculations, forecasting data and for various other purposes. Excel has various kinds
of inbuilt formulas that can be used for analysing the data and reaching to a conclusion. The
main and primary function of Excel is to organize all the information or data of the organization
in an appropriate manner. It is important to organize the data if formulas, techniques or methods
are required to be used for further analysis. Excel also provides an option of generating graphs or
1

charts so that it becomes much easier for organizations to visualize their data and take
appropriate decisions accordingly. In order to analyse sales and profit of the organization over
years pivot table and graph method can be used.
Row Labels Sum of Sales
Average of
Profit
Furniture 5178590.542 68.11660673
2009 1472671.724 137.9565402
2010 1252518.416 21.35772727
2011 1268656.078 120.6278708
2012 1184744.324 -10.02715311
Office Supplies 3752762.1 112.3690738
2009 1035399.64 151.9643028
2010 910359.95 100.9771282
2011 796383.79 78.20144784
2012 1010618.72 116.7143313
Technology 5984248.182 429.2075157
2009 1701825.482 359.7878373
2010 1397208.679 447.037081
2011 1364905.113 519.0770085
2012 1520308.909 402.5972762
Grand Total 14915600.82 181.1844243
Figure 1 Average profit and sum of sales of last 4 years.
Data interpretation: Above graph has been made with the help of pivot table feature of Excel
with the help of which Average profit in last 4 years and overall sales in past 4 years have been
2

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

calculated. From the above graph it is clear that technology provides maximum profit to the
company and their sales were highest in Technology in 2009.
Figure 2 Average profit in last 4 years
Data Interpretation: From the above graphs overall average profit of the company in past four
years have been explained. This graph clearly explains highest average profit of the company
was seen in technology department in 2011.
Technology: average sales and average profit from year 2009 to year 2012
From the above data interpretation, it is clear that sales and profit of technology
department in last four years is maximum.
Row Labels
Average of
Sales
Average of
Profit
Technology 2897.941008 429.2075157
2009 3145.703293 359.7878373
2010 2631.278114 447.037081
2011 2916.463917 519.0770085
2012 2895.826493 402.5972762
Grand Total 2897.941008 429.2075157
3

Figure 3 Average profit and sales of Technology
Data Interpretation: From the above graph it is clear that average sales of technology were
highest in year 2009 Whereas average profit was highest in year 2011.
Demonstration of ways in which data can be practically analysed using Excel functions such as
Lookup, Pivot table, graphs and charts
Lookup Function: There are various kinds of functions and formulas available in Excel that can
be used analysis of data. Lookup is one of those functions, this function is mostly used to select
values from different range (Fylstra, 2017). It is also used to select a particular type of data from
a large amount of data set. This formula is applied to the data which is assembled or stored in
either ascending or descending order. This function helps in returning a value from a defined
range of data or from an array. It helps in searching a value in other columns, if the value
matches the value in another column then it directly returns the value (Wang, Luo and Liu,
2016). So, it can be said that it is used for searching for a particular value if defined rows or
columns. Default behaviour of lookup helps in solving any kinds of problems within Excel.
Syntax of lookup function is:
LOOKUP (value, loopup_range, [result_range])
If we want to identify how much sales order id = 483 generated then for this lookup
function can be used
=LOOKUP(483,B2:B8400,F2:F8400)
4

=4965.76
Pivot Table: It is one of the most powerful tools of MS Excel as it can be used for analysing,
calculating and summarizing data and also allows users to compare it with other data. It helps the
users to extract data in a proper manner. It is mostly used for calculating average, sum, extracting
data of particular region and for many other purposes.
Calculation of shipping cost, product base margin and overall sales of furniture, office
supplies and technology
Row Labels
Sum of Product Base
Margin
Average of
Sales
Sum of Shipping
Cost
Furniture 1006.77 3003.82282 53243.69
Bookcases 122.09 4352.656296 8646.07
Chairs & Chairmats 228.46 4564.343394 15512.69
Office Furnishings 414.31 885.9058503 8402.72
Tables 241.91 5252.100116 20682.21
Office Supplies 2116.77 814.0481779 36095.51
Appliances 240.64 1698.137189 6854.11
Binders and Binder Accessories 342.42 1117.986437 6633.52
Envelopes 91.97 707.6658537 1682.77
Labels 108.61 135.3526042 288.66
Paper 458.85 364.4513143 7914.41
Pens & Art Supplies 337.76 263.9924487 2041.81
Rubber Bands 95.14 83.83592179 225.21
Scissors, Rulers and Trimmers 92.23 562.474375 670.51
Storage & Organization 349.15 1960.041392 9784.51
Technology 1148.77 2897.941008 18491.84
Computer Peripherals 449.67 1049.968259 4067.34
Copiers and Fax 36.37 12992.65862 2446.88
Office Machines 149.75 6435.303086 7135.91
Telephones and Communication 512.98 2139.65323 4841.71
Grand Total 4272.31 1775.878179 107831.04
5

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Figure 4 product base margin and shipping cost
Calculation of profit region wise
Row Labels
Sum of
Profit
Atlantic 238960.66
North Carolina 2841.11
Northwest
Territories 8307.05
Ontario 439214.57
Prarie 321160.12
Quebec 140426.65
West 297008.61
Yukon 73849.21
Grand Total 1521767.98
6

Figure 5 profit region wise
Total order and average profit region wise
Row Labels
Count of Order
ID
Average of
Profit
Atlantic 1080 221.2598704
New Brunswick 323 357.1267492
Newfoundland 82 83.96512195
Nova Scotia 464 183.9695474
Prince Edward Island 211 148.6336967
North Carolina 79 35.96341772
Elon 79 35.96341772
Northwest Territories 59 140.7974576
Northwest Territories 59 140.7974576
Ontario 2161 203.2459833
Georgina 129 209.2537984
Hanover 624 193.4693109
Ontario 739 168.9889851
Orangeville 334 222.3565569
Waterloo 335 275.659791
Prarie 1706 188.2532943
Manitoba 793 172.0392938
Saskachewan 913 202.3362103
Quebec 781 179.8036492
Quebec 781 179.8036492
West 1991 149.1755952
Alberta 865 175.6606705
7

British Columbia 1126 128.8296004
Yukon 542 136.253155
Dawson 291 201.9967354
Whitehorse 251 60.03250996
Grand Total 8399 181.1844243
Figure 6 Total order and average profit region wise
PART 2
By using audidealership.csv file show conjunction with Weka with the example of clustering
Figure 7 audidealership data relationship
8