logo

Data Analysis and Cleaning with Spotfire

   

Added on  2020-04-01

5 Pages890 Words56 Views
 | 
 | 
 | 
Introduction
Data Analysis
Data analysis or data analytics, is a process of inspecting, cleansing, transforming,
and modeling data with the goal of discovering useful information, suggesting
conclusions, and supporting decision-making. Data analysis has multiple facets and
approaches, encompassing diverse techniques under a variety of names, in
different business, science, and social science domains.
Data mining is a particular data analysis technique that focuses on modeling and
knowledge discovery for predictive rather than purely descriptive purposes, while
business intelligence covers data analysis that relies heavily on aggregation,
focusing on business information. Predictive analytics focuses on application of
statistical models for predictive forecasting or classification, while text analytics
applies statistical, linguistic, and structural techniques to extract and classify
information from textual sources, a species of unstructured data. All are varieties of
data analysis.
The best practices for understanding quantitative data.
Check raw data for anomalies prior to performing your analysis;
Re-perform important calculations, such as verifying columns of data that are
formula driven;
Confirm main totals are the sum of subtotals;
Check relationships between numbers that should be related in a predictable
way, such as ratios over time;
Normalize numbers to make comparisons easier, such as analyzing amounts
per person or relative to GDP or as an index value relative to a base year;
Assignment
Raw data given for consist of 8400 rows and 22 columns. It is an example of sales
data where the variables used consist of “Row id,col ID,Sales,
priority,profit,province,order date etc”.
The first thing we need to perform is data cleaning. Data cleaning is the
process of detecting and correcting (or removing) corrupt or inaccurate records
from a record set, table, or database and refers to identifying incomplete, incorrect,
inaccurate or irrelevant parts of thedata and then replacing, modifying, or deleting
the dirty or coarse data.
The first step in this was to check for missing values in the raw data. Next step was
to look for dupliacte rows and delete it. Delivery date should not be greater than
order date. So we made a new variable “lead time” and confirmed whether all the
values are positive. (please see the below pic )
Data Analysis and Cleaning with Spotfire_1

After data cleaning the next step was to prepare the data however the data
is now consistent and devoid of anamolies so we will not go for data preparation.
Had the data been inconsistent and from multiple sources,sheets we would have
gone for data preparation. The Raw data is already prepared in data cleaning.
The discount variable was not specified whether it was given on Unit price or
Unit Price*Order quantity+Shipping cost. If the discount was specified we could
have easily verified whether the profit column is correct or not.(see the below pic)
Data Presentation and Analysis
Data Analysis and Cleaning with Spotfire_2

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analytics for Crime Incidents and Police Funding
|12
|1290
|388

Introduction to Data Analytics
|21
|5524
|43