Comprehensive Analysis and Visualization of Superstore Data - Report

Verified

Added on  2023/06/05

|25
|3983
|222
Report
AI Summary
This report provides a comprehensive analysis and visualization of Superstore data. It begins with data pre-processing, detailing the use of filter options, SUMIFS function, and pivot tables in Excel to analyze sales and profit trends from 2009 to 2012. The analysis reveals sales and profit declines, regional performance, and product-based sales and profit contributions. The report also includes data visualization techniques using pie charts and bar diagrams to represent sales and profit data, including a pivot table analysis by product and region. The report further explores statistical analysis of student ice cream preferences, and discusses data mining techniques. Finally, it examines advantages and disadvantages of Excel and SPSS.
Document Page
Analysis & Visualisation
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
Part 1................................................................................................................................................3
Data Pre-Processing, Analysing & Visualisation........................................................................3
PART 2..........................................................................................................................................11
2.1 Students which like vanilla flavoured ice-cream.................................................................11
2.2 Students that are male and female.......................................................................................12
2.3 Mean and Median of participants who like chocolate and strawberry flavour ice cream.. .13
2.4 Cluster analysis example- K means clustering....................................................................15
2.5 Text mining and data mining methods that are used in businesses.....................................22
2.6 Advantages and disadvantages of using Excel and SPSS...................................................23
REFERENCES................................................................................................................................1
Document Page
Part 1
Data Pre-Processing, Analysing & Visualisation
Data Pre – processing
The term data pre – processing is the term used for referring to the process by which data
is manipulated before it is to be used to work upon. The purpose for which data processing is
done is to ensure that the processing of the data will give results that are enhanced. In data
mining data pre – processing is considered as one of the most important step. Data pre-
processing is done through following a systematic step wise procedure. The steps involved are
namely data cleaning, data transformation and data reduction (Tang, Yuan and Zhu, 2020).
Every data can have various parts that are missing and not related to the information that is
required to be assessed. To handle such data, the process of data cleaning is undertaken. The
situation of missing data is where when there are some parts of the data that are missing from the
complete data. There are two main ways that are followed for the purpose of dealing with such
missing data known as ignoring the tuples and second is filling the data values that are missing.
After cleaning the data in data processing the next step is noisy data. There is data some
that cannot be understand or processed with the usage of machines such a data is considered as
noisy data. The faults in data collection are the responsible elements for the creation of noisy
data. For handling such a data during the data pre – processing methods like binning method,
regression and clustering are adopted. Further after the data cleaning step the step of data
transformation is followed (Al-Taie, Kadry and Lucas, 2019). For the generation of valuable or
desired results from the present set of data it needs to be transformed into the appropriate form of
data. There are number of ways that are followed for the particular task of data transformation.
Few examples are normalization, attribute selection, discretization and concept of hierarchy
generation. Data reduction is the next step. This step is dedicated especially for easing the
process of data analysation. The various steps followed for this are data cube aggregation,
attribute subset selection, numerosity reduction and dimensionality reduction.
Document Page
The given dataset of superstore has also been pre- processed for the purpose of further
analysing and visualising of such data. The data is pre- processed using the filter feature present
in excels’ sort and filter option. The reason for the selection of this particular feature of excel is
that this feature will help in analysation of data of several years to determine the decline in sales
of the concerned store (Data Preprocessing, Analysis & Visualization, 2022). Using this feature
on the column of order date only the data of a particular year can be viewed at a time and rest of
the data that is irrelevant or better say the data that is not required or necessary at one time can be
hided.
Year Sales Profit
2009 1754061 152252
2010 1318867 132154.9
2011 1473355 161414.1
2012 1601552 130967
By using the filter option, the above table is created in the excel. The table is the
representation of the total sales and profit data for the year 2009, 2010, 2011 & 2011. Using this
option in excel the data can be copied, formatted and also printed without the need of arranging it
in ascending or descending format or moving the data to any other location. The process of data
pre-processing is an essential element as it helps in generation of results that are reliable in a
precise format.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The available data has been pre – processed by using SUMIFS function and Pivot Tables
in addition to the filter option. SUMIFS function is a basic function of excel that is use widely
for the data pre – processing. This function is used for getting the sum of specific range of
values. In the current case this function is for getting the sum total of the sales & profits of
Superstore specifically for each of the years starting from 2019 to 2012.
“SUMIFS (Sales rows, date rows, “>=” &DATE (year, month, date), Date rows, “<=”
&DATE (year, month, date)”, this formula is applied in the excel for the application of SUMIFS
function.
After the application of SUMIFS function the next task is to find the decline in profits
over the years. This step is done by applying the formula as dividing the result of deduction of
previous year’s sales from that of the current year by the sales of current year multiplied by 100.
Pivot Table is the next function of excel that is applied over the processed set of data. It
is one of the powerful excel function or tool by the utilization of which the Excel gives
opportunity to its users in form of performing calculations, summarization of data and analyse it
on the basis of establishing comparison of the resultant data or by identification of patterns or
Document Page
trends that the data represents. It is an interactive approach using which data in huge quantities
can be summarized swiftly (Alshdaifat and et.al, 2021). There are three pivot tables that are
created from the given set of data. The results of the SUMIFS function along with the pivot table
function will be analysed in the data analysation.
Data Analysation
SUMIFS Function
Year Sales Profit
Decline in
sales
Decline in
profit
2009 1754061 152253
2010 1318867 132154.9 -24.81% -13.20%
2011 1473355 161414.1 11.71% 22.14%
2012 1601552 130967 8.70% -18.86%
In the data pre – processing the way by which the SUMIFS function is applied is
explained. Above is the tabulated representation of the results that are generated with the help of
using this function in excel. The first column of this table represents the year, the next two
columns are showing the total sales and profit that have been generated by the Superstore for
each of these year.
Decline in sales and decline in profits are the attributes that have calculated by the using
the formula (Sales of the current year – Sales of the previous year) / Sales of the previous year.
This formula represents the results in the number format but for better analysing of the results the
percentage form is considered as more preferable format. Excel provides the option of
representing the data into percentage format. For this on the home tab General display of values
was changed to percent format. From the data it is clear that the sales of the Superstore in year
2010 experienced a fall of nearly 25% which is a huge decline. Further for the years 2011 &
2012 the sales increased around 12% & 9% respectively from the previous year. The previous
year is 2010 for 2011 and 2011 for the year 2012.
It is clear that out of the three years in two years’ super store has experienced declining
profits. The decline in profits for the year 2010 is 13.20% and around 19% for the year 2012.
The year 2011 was good for the company as its profits were increased from the previous year.
The sales of the store for year 2012 is showing an increasing trend whereas for the same period
Document Page
the profits are showing declining pattern it means that the store expenses need to controlled for
the better performance.
Pivot Tables
Order Date (Multiple Items)
Row Labels Sum of Sales
Sum of
Profit
Furniture 1472671.724 61804.53
Office Supplies 1035399.64 177646.27
Technology 1701825.482 194645.22
Grand Total 4209896.846 434096.02
The above pivot table represents the total sales and profits that are generated by the
Superstore for the years 2009, 2010, 2011 and 2012 grouped by the type of product that are
furniture, office supplies and technology. It is clear from the above data that the maximum of
store’s sale is for the product of technology followed by furniture and office supplies. The store
is earning maximum of its profits from the technology, and office supplies and furniture follows
it. these results indicate that the total profit generated by the Superstore by selling office supplies
is more than furniture but the sales of furniture is more than that of office supplies. This can be
interpreted as selling office supplies is more profitable for the store. Hence it can consider to
increase its focus over the sales of office supplies.
Row Labels Sum of Sales
Atlantic 2014248.204
North Carolina 116376.4835
Northwest
Territories 83817.746
Ontario 3780242.063
Prarie 2837304.602
Quebec 1510195.08
West 3597549.276
Yukon 975867.371
Grand Total 14915600.82
This pivot table is showing the total of sales for all the years of Superstore grouped by the
region. This data is extracted as it can be very helpful for the store to get knowledge of the
regions in which it is more popular (Wasserman, 2018). The popularity of the store in specific
regions can be determined by the total of sales for the years. The maximum number of store’s
sales is in the region of Ontario and the least sales there in the region of Northwest Territories.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
the second most popular region context of store’s sales is West followed by Prarie, Atlantic,
Quebec, Yukon, North Carolina.
Row Labels
Sum of
Profit
Atlantic 238960.66
North Carolina 2841.11
Northwest
Territories 8307.05
Ontario 439214.57
Prarie 321160.12
Quebec 140426.65
West 297008.61
Yukon 73849.21
Grand Total 1521767.98
The above table represents the total amount of three years profits earned by the
Superstore clubbed by the region. The analysation of is resultant data is helpful for the company
in knowing which area generates maximum of the store’s profit. Ontario region gives most of the
profits to the store. This region is followed by Prarie, West, Atlantic, Quebec, Yukon, Northwest
Territories & North Carolina. The store should focus on the profitable regions to further enhance
profits that are earned from such regions (Belotto, 2018). And the worst profit generating regions
can be considered so as to whether continuing the operations in those regions.
Data Visualisation
The pie chart is representing the sales of the Superstore for the four years. Out of the
collective sales of all the years 26% is from the year 2012, 29% from 2009, 21% from the year
Document Page
2010 and 24% for the year 2011. Maximum is from year 2009 and minimum is from the year
2010.
The above bar diagram shows the profit earned by the store for all the years. The
maximum profits are from the year 2011 followed by 2009, 2012 and 2010.
The decline of profits and sales is presented through this chart. The sales declined in the
year 2010 and increased for the years 2011 & 2012 (Weissgerber and et.al, 2019). While the
profits declined for year 2010 & 2012. In the year 2011 profits rise from the previous year.
Pivot Tables
Document Page
The chart shows the total sales and profit generated by Superstore for all the years
collectectively by products. Maximum selling and profit eaning product is technology based.
This graph is representing the sales by different regions in all the years. Maximum sales
are done in the region of West (Peck, Ayuso and El-Etr, 2019). There is not much difference
between Ontario and West region. Worst performing region is of Northwest Territories.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The maximum profit earning region for all the years collectively is Ontario. North
Carolina generated least profits.
PART 2
2.1 Students which like vanilla flavoured ice-cream
Ice_cream
Frequency Percent Valid Percent Cumulative
Percent
Valid
Vanilla 95 47.5 47.5 47.5
Chocolate 47 23.5 23.5 71.0
Strawberry 58 29.0 29.0 100.0
Total 200 100.0 100.0
Document Page
Interpretation: From the above table it has been analyzed that 95 out of 200 students like
vanilla flavor of ice cream. The table further shows that 47.5% of the students lie in the category
of the students which prefer vanilla over other flavors of ice cream. The students while like
vanilla are in majority in comparisons with the other students which like chocolate and vanilla.
2.2 Students that are male and female
Gender
Frequency Percent Valid Percent Cumulative
Percent
Valid
Male 91 45.5 45.5 45.5
Female 109 54.5 54.5 100.0
Total 200 100.0 100.0
Document Page
Interpretation: From the above frequency table, it has been found that there are 91 students out
of 200 who are male. However, the percentage of females in students is 54.5% which shows that
are they are in majority in the high school.
2.3 Mean and Median of participants who like chocolate and strawberry flavour ice cream.
Statistics
Ice cream flavour
N Valid 105
Missing 0
Mean 2.55
Median 3.00
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Ice cream flavour
Frequency Percent Valid
Percent
Cumulative
Percent
Valid
Chocolate 47 44.8 44.8 44.8
Strawberr
y 58 55.2 55.2 100.0
Total 105 100.0 100.0
Interpretation: From the above table it has been analyzed that the average number of students
like strawberry flavour of ice cream. The median value of (3.00) depicts that 50% of students
like strawberry flavoured ice cream in comparison with other flavours.
Document Page
2.4 Cluster analysis example- K means clustering
Below is the process of using K-means cluster for analysing the provided data set in SPSS:
i) Press Analyse Classify K- means Cluster.
Document Page
ii) Open the K-means cluster dialog box and select the variables for shifting them to the Variable
box.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
iii) Press the Iterate tab and select the maximum iteration value which is 10 by default. Then
click on tab.
Document Page
iv) Click the save button and select cluster membership then press continue.
Document Page
v) Press the options tab and select the three options ‘Annova table’, ‘Initial cluster centers’ and
‘Cluster information for each case’. At last, click on the continue option.
At last, click on .
Results
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Initial Cluster Centers
Cluster
1 2 3
Gender 1 1 0
Ice cream
flavour 2 1 1
video 29 55 72
puzzle 26 71 31
Iteration Historya
Iteration Change in Cluster Centers
1 2 3
1 16.287 12.810 17.590
2 3.070 1.812 1.038
3 1.793 .968 .329
4 1.400 .798 .376
5 1.097 .683 .472
6 .407 .115 .616
7 .000 .000 .000
a. Convergence achieved due to no or small change in cluster
centers. The maximum absolute coordinate change for any
center is .000. The current iteration is 7. The minimum
distance between initial centers is 43.313.
Interpretation: The above K-Means cluster iteration table has analysed that a significant shift in
the initial cluster centres. However, coming to the fifth cluster the shift has started to reduce and
the situation is stabilised. At last, the final cluster is showing fewer adjustments which signifies
that they have settled down.
Final Cluster Centers
Cluster
1 2 3
Gender 1 1 0
Ice cream
flavour 2 2 2
Document Page
Video 41 58 56
Puzzle 46 61 39
Interpretation: From the above table it has been analysed that majority of the students are
females in the first cluster. However, maximum students like chocolate ice cream flavour in the 3
cluster. Further it has been found that 41 students like to play video games in the first cluster
which is the least in comparison with other clusters. At last, second cluster has the highest
number of students (61) that like to play puzzles rather that other games.
Distances between Final Cluster Centers
Cluster 1 2 3
1 22.077 16.063
2 22.077 21.380
3 16.063 21.380
Interpretation: From the above table, it has been found that the final cluster centres of cluster 1
and 3 have more variation. In comparison with these two, 2 cluster has less variation in its final
cluster centres.
ANOVA
Cluster Error F Sig.
Mean Square df Mean Square df
Gender .403 2 .248 197 1.628 .199
Ice cream
flavour 1.703 2 .725 197 2.351 .098
video 5889.952 2 39.226 197 150.153 .000
puzzle 7430.226 2 40.994 197 181.253 .000
The F tests should be used only for descriptive purposes because the clusters have been chosen
to maximize the differences among cases in different clusters. The observed significance levels
are not corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster
means are equal.
Document Page
Number of Cases in each
Cluster
Cluster
1 67.000
2 101.000
3 32.000
Valid 200.000
Missing .000
Interpretation: From the above Anova table, it has been found that cluster 2 has the highest
number of cases which is followed by cluster 1 being at the second and cluster 3 has least cases
with it.
2.5 Text mining and data mining methods that are used in businesses
Data mining is a process of analysing large data for determining specific trends which may have
a significant impact on the overall performance of the businesses. Clustering is a commonly
used data mining method which is useful for businesses to group different data points on the
basis of their characteristics (Regin, Rajest and Singh, 2021). For example, businesses in the
retail industry use the clustering models for finding the segment of customers who purchase a
particular product or service. Another data mining method is association which is used for
determining the relationships between different variables that are present in databases. For
example, electronics businesses use this method for recommending accessories which are
specific to the digital devices which are purchased by customers (Gibert and et.al., 2018).
Text mining is an aspect which is beneficial for extracting crucial data from standardised
language text data. Term-based method is effective for inspecting a document and analyse the
key terms which have a unique identity. This is essential for reducing confusion among the users
and enhancing their overall experiences (Gola and et.al., 2018). Phrase-based method helps in
anticipating documents on the basis of phrases which is more effective and less complex.
Businesses are able to reduce confusion with the help of this text mining method and resolve its
issues.
2.6 Advantages and disadvantages of using Excel and SPSS
Basis Excel SPSS
Advantages Excel is beneficial for The results can be
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
organizing large data
into easy charts, tables
and spreadsheets.
Various formulas are available
in Excel which are helpful in
improving calculating
complex data sets (Becker and
Bröcker, 2021).
generated in SPSS
within a matter of
seconds.
Large amount of data
can be handled in this
software which is
convenient for
analysing big data sets.
Disadvantages It requires a lot of
practice and training to
learn the concepts of
Excel.
It is costly to procure a
professional version of
Excel which can
handle large amount of
data.
Representation of
charts and tables is
complex in comparison
with Excel which is
slightly difficult for
interpretation
(Purwanto and et.al.,
2021).
This is an expensive
software which is not
easily accessible for
use.
Document Page
REFERENCES
Books and Journals
Alshdaifat, E. A. And et.al, 2021. The effect of preprocessing techniques, applied to numeric
features, on classification algorithms’ performance. Data. 6(2). p.11.
Al-Taie, M. Z., Kadry, S. and Lucas, J. P., 2019. Online data preprocessing: a case study
approach. International Journal of Electrical and Computer Engineering. 9(4). p.2620.
Assarroudi, A. and et.al, 2018. Directed qualitative content analysis: the description and
elaboration of its underpinning methods and data analysis process. Journal of Research
in Nursing. 23(1). pp.42-55.
Becker, M. and Bröcker, J.O., 2021. Business Model Canvas-Overview of the main advantages
and disadvantages.
Belotto, M. J., 2018. Data analysis methods for qualitative research: Managing the challenges of
coding, interrater reliability, and thematic analysis. The Qualitative Report. 23(11).
pp.2622-2633.
Gibert, K. and et.al., 2018. Which method to use? An assessment of data mining methods in
Environmental Data Science. Environmental modelling & software, 110, pp.3-27.
Gola, J. and et.al., 2018. Advanced microstructure classification by data mining
methods. Computational Materials Science. 148. pp.324-335.
Peck, E. M., Ayuso, S. E. and El-Etr, O., 2019, May. Data is personal: Attitudes and perceptions
of data visualization in rural pennsylvania. In Proceedings of the 2019 CHI Conference
on Human Factors in Computing Systems (pp. 1-12).
Purwanto, A. and et.al., 2021. Education Research Quantitative Analysis for Little Respondents:
Comparing of Lisrel, Tetrad, GSCA, Amos, SmartPLS, WarpPLS, and SPSS. Jurnal
Studi Guru Dan Pembelajaran, 4(2).
Regin, R., Rajest, S.S. and Singh, B., 2021. Spatial data mining methods databases and statistics
point of views. Innovations in Information and Communication Technology Series,
pp.103-109.
Tang, S., Yuan, S. and Zhu, Y., 2020. Data preprocessing techniques in convolutional neural
network based on fault diagnosis towards rotating machinery. IEEE Access. 8.
pp.149487-149496.
Wang, C. and et.al, 2019. Visualization by example. Proceedings of the ACM on Programming
Languages. 4(POPL). pp.1-28.
Wasserman, L., 2018. Topological data analysis. Annual Review of Statistics and Its
Application. 5. pp.501-532.
Weissgerber, T. L. And et.al, 2019. Reveal, don’t conceal: transforming data visualization to
improve transparency. Circulation. 140(18). pp.1506-1518.
Online
1
Document Page
Data Preprocessing, Analysis & Visualization. 2022. [Online]. Available through:
<https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_wit
h_python_data_preprocessing_analysis_visualization.htm>
2
chevron_up_icon
1 out of 25
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]