Data Analysis Report: Anomaly Detection and Handling Missing Values

Verified

Added on 2020/10/05

AI Summary

This report delves into the critical aspects of anomaly detection and the handling of missing values within a dataset. It begins by identifying anomalies, which are presented as unusual data points or missing values, and highlights the absence of clear patterns or trends. The report then explores strategies for handling these missing values, including the use of the `is.na()` function for identifying missing data and imputation techniques such as popular averaging methods (mean, mode, median) and predictive techniques. The popular averaging techniques provide a quick estimation of missing values, while predictive techniques assume a relationship between missing observations and selected variables. The report emphasizes the impact of these techniques on statistical analysis and concludes with a discussion of various statistical and machine learning methods for imputing missing values, such as regression techniques and methods like SVM or data mining. The document is contributed by a student to be published on the website Desklib. Desklib is a platform which provides all the necessary AI based study tools for students.

TITLE

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

TABLE OF CONTENTSPresence of Anomalies in specific dataset
Anomalies are presented as unusual data or missing values in data. In the given data set, there are
various anomalies as in simple words there is absence of any pattern or trend. The first anomaly
is related to variable as in this data set we cannot be capable for identifying independent and
dependent variable. This data has replicated various purchase of grocery as for analyzing
associations between item purchase at point of sale for display, guidance to sales personnel for
the purpose of promoting cross sales and guidance for piloting eventual time for purchasing
electronic recommend-er system for boosting cross sales. All values which are reported for
prefixing columns through quantity and to indicate rows with missing value and percentage with
sum of rows. In this dataset columns has presence of distinct value which signifies that all data
could not be converted in information.
Listing possible strategies for handling cases with missing values
The most common task with context of data analysis is to deal with specific anomolies or
missing value. Missing data might arise in series of analysing statistics. In the present scenario
there are various methods for handling missing data consist of some related approaches which
provides reasonable outcome. If there is absence of information is denoted as missing values.
These values are coded through NA symbol. In the similar aspect, for identifying missing in
specific dataset is.na () function could be implied. This function is also used for extracting
percentage and sum of missing in dataset with function as sum(is.na(dt)) and mean(is.na(dt)).
There should be application of imputation which is process of replacing missing data with values
which are substituted. In the similar aspect, missing values could be recoded on basis of specific
indicators which shows missing value. This could be also subset through various elements which
consist of value for assigning desired value to those elements. The imputation strategies are:
 Popular averaging techniques
 Predictive techniques
Popular averaging techniques: The most popular averaging techniques such as mean,
mode and median are used for inferring missing values. The approaches which are ranging
through global average for variables to average based groups which are generally considered. It
helps in accomplishing quick estimate of missing values and to decrease differences in dataset as
missing observations with similar value. It would indirectly impact statistical analysis of dataset
1

because of dependency on percentage of imputing missing observation along with various
metrics like correlation, mean, median etc. which might get impacted.
Predictive techniques: There is great assumption from imputing its missing values
through predictive techniques that nature of observations which are missing are not completely
observed at random and selected variables for imputing relationship with missing observation
with estimate of yield imprecise. Generally this predictive model could be implied for imputing
missing value for revenues, device and OS. There is presence of different statistical methods
such as regression techniques along with machine learning methods such as SVM or method of
data mining for imputing these missing values.
2