logo

Data Handling: Evaluating Excel and Weka for Data Analysis

   

Added on  2023-01-12

19 Pages3921 Words77 Views
Professional DevelopmentData Science and Big DataStatistics and Probability
 | 
 | 
 | 
Data handling
Data Handling: Evaluating Excel and Weka for Data Analysis_1

Contents
INTRODUCTION...........................................................................................................................1
Part 1................................................................................................................................................1
Evaluating the use of Excel by using the Superstore data...........................................................1
Part 2................................................................................................................................................5
Workings of Weka Clustering.....................................................................................................5
Explaining the most common data mining methods..................................................................13
Discussing the advantages/disadvantages of Weka over Excel.................................................13
CONCLUSION..............................................................................................................................13
REFERENCES..............................................................................................................................14
Data Handling: Evaluating Excel and Weka for Data Analysis_2

INTRODUCTION
The concept of data handling is related with processing, analysing and then presenting the
data so that numerical information can be understood by everyone. This concept is more like a
process in which the pre recorded data is analysed using certain tools and techniques so that
meaningful insights can be gained from that data(Shmueli, Patel and Bruce, 2011). The main aim
of this report is to evaluate the current trends in data warehousing along with building an
understanding regarding principles of predictive analytic software. For this purpose, two data
sets are used in this report which is “Superstore’ and “audidealership”.
In this report, the first data set is used to pre process, analyse and then visualise the data
using analytical software of Microsoft Excel. This process is done by using Excel functions of
Pivot tables, charts and graphs. Using the second dataset of “audidealership”, clustering is done
using analytic software of Weka. Various data mining methods are also analysed along with
evaluation of merits and demerits of Weka over Excel.
Part 1
Evaluating the use of Excel by using the Superstore data
Current trends in data warehousing, business intelligence and data mining
Data warehousing the process of storing the data into warehouses where the security of
the data can be ensured. The aim behind this concept is to combine all the data collected from
different sources in one place so that it can easily be analysed for the purpose of gaining data
insights and decision making (IşıK, Jones and Sidorova, 2013). There are various current trends
in the field of data warehousing which are database management system, agile development
methodologies, data streaming and consolidation with business intelligence organisations. It is
difficult for organisations to manage and store their data due to which they hire firms which are
specialised in BI.
In the field of business intelligence, current trends which are shaping the future are usage
of models such as SaaS and SOA. Also the software of Web 2.0 based visualisation is a current
trend in BI.
Data mining is different from the concept of data warehousing. In this concept, data is
extracted from different sources and then organised for better usage (Minelli, Chambers and
1
Data Handling: Evaluating Excel and Weka for Data Analysis_3

Dhiraj, 2013). This concept is based on complex algorithms due to which, it requires specialised
skills to mine a data. Current trends in the field of data mining are sequential and time series
mining in which data of cyclical and seasonal trends can be mined.
Pre processing the data
Superstore is an organisation which is facing the decline in their sales and profit over the
years. In order to determine the reason behind this decline, the data of Superstore is used. The
first step of evaluation is the pre processing the data.
Data pre processing allows cleaning and transforming the data in such a way that it can
be used for effective analysis and visualisation. For the data provided, the information is first
cleaned using identifying the missing values. In Microsoft Excel application, all the data is
selected and then empty cells are navigated. The Excel function which applied is “Shift key +
F4”. From this process, it has been identified that there are various missing values in the column
of “product base margin”. Once the missing the values are identified, these cells are filled with
their column average (Talati, McRobbie and Watt, 2012).
The next stage of data pre processing is data transformation. In this the data will be
transformed into normal. For this all the numerical values (sales, discount, profit, unit price,
shipping cost and product base margin) in data set are formatted to two decimal points.
The third and last stage of data pre processing is data reduction. The data of superstore
worksheet is reduced in another worksheet by the way of Pivot table. In this table only those
variables are selected which can impact the superstore’s sales and profit. These variables are
order date, Sum of Sales, Sum of Discount, Sum of Profit, Sum of Unit Price, Sum of Shipping
Cost, sum of Product Base Margin, individual order priority, shipment mode, region, customer
segment, product category and lastly product container.
Analysing and visualising the data
Sum of
Profit
Sum of
Sales
Sum of
Discount
Sum of Ship-
ping Cost
Sum of
Unit Price
Sum of Order
Quantity
2009 434096 4209897 105.39 28481.8 232831 54508
2010 364917 3560087 105.81 27354.3 162468 54379
2011 380311 3429945 101.67 24939.9 159653 51413
2012 342444 3715672 104.32 27055.2 195468 54480
2
Data Handling: Evaluating Excel and Weka for Data Analysis_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Handling and Business Intelligence
|16
|3473
|40

Data Handling and Business Intelligence
|15
|2608
|27

Data Handling and Business Intelligence
|17
|3211
|56

Data Handling and Business Intelligence
|15
|3253
|52

Data Handling and Business Intelligence
|17
|3795
|21

Data Handling and Business Intelligence
|15
|3185
|2