logo

Data Analysis & Visualisation for Desklib

   

Added on  2023-06-05

25 Pages3983 Words222 Views
 | 
 | 
 | 
Analysis & Visualisation
Data Analysis & Visualisation for Desklib_1

Table of Contents
Part 1................................................................................................................................................3
Data Pre-Processing, Analysing & Visualisation........................................................................3
PART 2..........................................................................................................................................11
2.1 Students which like vanilla flavoured ice-cream.................................................................11
2.2 Students that are male and female.......................................................................................12
2.3 Mean and Median of participants who like chocolate and strawberry flavour ice cream.. .13
2.4 Cluster analysis example- K means clustering....................................................................15
2.5 Text mining and data mining methods that are used in businesses.....................................22
2.6 Advantages and disadvantages of using Excel and SPSS...................................................23
REFERENCES................................................................................................................................1
Data Analysis & Visualisation for Desklib_2

Part 1
Data Pre-Processing, Analysing & Visualisation
Data Pre – processing
The term data pre – processing is the term used for referring to the process by which data
is manipulated before it is to be used to work upon. The purpose for which data processing is
done is to ensure that the processing of the data will give results that are enhanced. In data
mining data pre – processing is considered as one of the most important step. Data pre-
processing is done through following a systematic step wise procedure. The steps involved are
namely data cleaning, data transformation and data reduction (Tang, Yuan and Zhu, 2020).
Every data can have various parts that are missing and not related to the information that is
required to be assessed. To handle such data, the process of data cleaning is undertaken. The
situation of missing data is where when there are some parts of the data that are missing from the
complete data. There are two main ways that are followed for the purpose of dealing with such
missing data known as ignoring the tuples and second is filling the data values that are missing.
After cleaning the data in data processing the next step is noisy data. There is data some
that cannot be understand or processed with the usage of machines such a data is considered as
noisy data. The faults in data collection are the responsible elements for the creation of noisy
data. For handling such a data during the data pre – processing methods like binning method,
regression and clustering are adopted. Further after the data cleaning step the step of data
transformation is followed (Al-Taie, Kadry and Lucas, 2019). For the generation of valuable or
desired results from the present set of data it needs to be transformed into the appropriate form of
data. There are number of ways that are followed for the particular task of data transformation.
Few examples are normalization, attribute selection, discretization and concept of hierarchy
generation. Data reduction is the next step. This step is dedicated especially for easing the
process of data analysation. The various steps followed for this are data cube aggregation,
attribute subset selection, numerosity reduction and dimensionality reduction.
Data Analysis & Visualisation for Desklib_3

The given dataset of superstore has also been pre- processed for the purpose of further
analysing and visualising of such data. The data is pre- processed using the filter feature present
in excels’ sort and filter option. The reason for the selection of this particular feature of excel is
that this feature will help in analysation of data of several years to determine the decline in sales
of the concerned store (Data Preprocessing, Analysis & Visualization, 2022). Using this feature
on the column of order date only the data of a particular year can be viewed at a time and rest of
the data that is irrelevant or better say the data that is not required or necessary at one time can be
hided.
Year Sales Profit
2009 1754061 152252
2010 1318867 132154.9
2011 1473355 161414.1
2012 1601552 130967
By using the filter option, the above table is created in the excel. The table is the
representation of the total sales and profit data for the year 2009, 2010, 2011 & 2011. Using this
option in excel the data can be copied, formatted and also printed without the need of arranging it
in ascending or descending format or moving the data to any other location. The process of data
pre-processing is an essential element as it helps in generation of results that are reliable in a
precise format.
Data Analysis & Visualisation for Desklib_4

The available data has been pre – processed by using SUMIFS function and Pivot Tables
in addition to the filter option. SUMIFS function is a basic function of excel that is use widely
for the data pre – processing. This function is used for getting the sum of specific range of
values. In the current case this function is for getting the sum total of the sales & profits of
Superstore specifically for each of the years starting from 2019 to 2012.
“SUMIFS (Sales rows, date rows, “>=” &DATE (year, month, date), Date rows, “<=”
&DATE (year, month, date)”, this formula is applied in the excel for the application of SUMIFS
function.
After the application of SUMIFS function the next task is to find the decline in profits
over the years. This step is done by applying the formula as dividing the result of deduction of
previous year’s sales from that of the current year by the sales of current year multiplied by 100.
Pivot Table is the next function of excel that is applied over the processed set of data. It
is one of the powerful excel function or tool by the utilization of which the Excel gives
opportunity to its users in form of performing calculations, summarization of data and analyse it
on the basis of establishing comparison of the resultant data or by identification of patterns or
Data Analysis & Visualisation for Desklib_5

trends that the data represents. It is an interactive approach using which data in huge quantities
can be summarized swiftly (Alshdaifat and et.al, 2021). There are three pivot tables that are
created from the given set of data. The results of the SUMIFS function along with the pivot table
function will be analysed in the data analysation.
Data Analysation
SUMIFS Function
Year Sales Profit
Decline in
sales
Decline in
profit
2009 1754061 152253
2010 1318867 132154.9 -24.81% -13.20%
2011 1473355 161414.1 11.71% 22.14%
2012 1601552 130967 8.70% -18.86%
In the data pre – processing the way by which the SUMIFS function is applied is
explained. Above is the tabulated representation of the results that are generated with the help of
using this function in excel. The first column of this table represents the year, the next two
columns are showing the total sales and profit that have been generated by the Superstore for
each of these year.
Decline in sales and decline in profits are the attributes that have calculated by the using
the formula (Sales of the current year – Sales of the previous year) / Sales of the previous year.
This formula represents the results in the number format but for better analysing of the results the
percentage form is considered as more preferable format. Excel provides the option of
representing the data into percentage format. For this on the home tab General display of values
was changed to percent format. From the data it is clear that the sales of the Superstore in year
2010 experienced a fall of nearly 25% which is a huge decline. Further for the years 2011 &
2012 the sales increased around 12% & 9% respectively from the previous year. The previous
year is 2010 for 2011 and 2011 for the year 2012.
It is clear that out of the three years in two years’ super store has experienced declining
profits. The decline in profits for the year 2010 is 13.20% and around 19% for the year 2012.
The year 2011 was good for the company as its profits were increased from the previous year.
The sales of the store for year 2012 is showing an increasing trend whereas for the same period
Data Analysis & Visualisation for Desklib_6

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analysis & Visualisation - Pre-processing, Analysis, Descriptive Statistics, Visualization
|33
|4485
|128

Importance of Excel for Data Analysis and Interpretation
|16
|3681
|370

Data Analysis & Visualization in Excel: Pre-processing, Analysis, and Visualization Techniques
|34
|3888
|478

Data Handling and Business Intelligence
|17
|3795
|21

Data Handling and Business Intelligence
|15
|2608
|27

Data Handling and Business Intelligence-2
|16
|3368
|303