Programming with R The output for the test function in the section in Question 1, produced the result shown in the table,Table 1: Test function Code and Outputbelow, indicating that the datasets from both functions used for loading were identical based on the number of rows and columns. Table1: Test function Code and Output test.load.data(Data1,Data2) [1] "Identical Datasets" The review process of the data for the section in Question 2, involved creating a function for checking for null entries and removing them from the dataset to form a new dataset. The second part of the review of the data involved the check for data points that are too large. This was achieved by creating a function to check for and omit data points that were upper outliers. The review process established the presence of outliers but not the presence of values that were too large. The plots below inFigure 1: Data Cleaning (Before and After)shows the before and after the omission of null entries from the dataset. From the plot, we note that the presence of null entries was 0 after cleaning. The plots were generated using the codes inTable 2: Data Cleaning (Before and After) Codebelow. Table2: Data Cleaning (Before and After) Code #Plotting for Data Cleaning Process par(mfrow=c(1,2)) plot(table(is.na(Data1)), main = "Before Cleaning (Presence of null entries)") plot(table(is.na(NewData)), main = "After Cleaning (Presence of null entries)") 2
Programming with R Figure1: Data Cleaning (Before and After) In developing the function for repairing the original dataset for the section in Question 3, the median for each variable per species was considered for the replacement of the missing values in the original dataset. Replacing the null entries with the median parameter allows for the data to be whole without losing any of its characteristics, this is since the parameter is based on the values of the complete entries in the same dataset. The table below,Table 3: Repair Matrix, shows the matrix that resulted from the repair function and the code. Table3: Repair Matrix RM Sepal.Length Sepal.Width Petal.Length Petal.Width setosa5.03.41.50.2 versicolor5.92.84.31.3 virginica6.43.05.52.0 The summary outputs for the entire iris data for the section in Question 4 is given inTable 4: Entire Summarybelow. The boxplot for the entire dataset as per the numeric variables are shown inFigure 2: Measurements using all Species. 3
Programming with R Table4: Entire Summary Sepal.LengthSepal.WidthPetal.LengthPetal.Width Min.:4.300Min.:2.000Min.:1.000Min.:0.100 1st Qu.:5.1001st Qu.:2.8001st Qu.:1.6001st Qu.:0.300 Median :5.800Median :3.000Median :4.350Median :1.300 Mean:5.843Mean:3.057Mean:3.758Mean:1.199 3rd Qu.:6.4003rd Qu.:3.3003rd Qu.:5.1003rd Qu.:1.800 Max.:7.900Max.:4.400Max.:6.900Max.:2.500 Figure2: Measurements using all Species The summary outputs for the iris data per species for the section in Question 4 is given inTable 5: Summary per Speciesbelow. The boxplot for the entire dataset as per the numeric variables are shown inFigure 3: Measurements per Species. 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser