logo

Introduction to Data Science: Analysis of Health and Development Conditions in New Zealand

   

Added on  2023-06-10

15 Pages3036 Words355 Views
Introduction to Data Science
Name of the Student
Name of the University
Author Note

INTRODUCTION TO DATA SCIENCE
Executive Summary
The aim of this research is to have an understanding about the health and development
conditions for New Zealand. To do the research data on the health conditions of the world
have been extracted from a secondary source of World Bank. There was presence of missing
data in the collected dataset, which were removed from the study. The data was also
extracted to one country such as New Zealand for this study. While eliminating the missing
data certain attributes were also eliminated as well as the data for the year 2015. The
extracted data was used for the analysis and the analysis was conducted with the help of
the statistical software R Studio. The following sections illustrates the results and the codes
obtained for the study.
i | P a g e

INTRODUCTION TO DATA SCIENCE
Table of Contents
1. Introduction......................................................................................................................... 1
2. Data Setup............................................................................................................................1
3. Exploratory Data Analysis.....................................................................................................1
4. Advanced Analysis................................................................................................................7
4.1 Cluster Analysis.............................................................................................................. 7
4.2 Regression Analysis........................................................................................................ 9
5. Conclusion.......................................................................................................................... 11
6. Reflections..........................................................................................................................11
References..............................................................................................................................12
ii | P a g e

INTRODUCTION TO DATA SCIENCE
1. Introduction
In this research light has been shed on the health and development conditions of
New Zealand. A subset of the original dataset collected from World Bank has been used for
the analysis. The subset has been chosen for the simplicity of the analysis. New Zealand has
been chosen as it is a moderately populated country and it was of interest to understand
the health conditions of a moderately populated country. The factors that has been
considered for analysis are total unemployment, gross national income and life expectancy
at birth. It is known that unemployment is factor to directly affect the gross national income
of a country. It might also be affecting the life expectancy of birth of an infant. The
relationships between these variables have been considered in this research.
2. Data Setup
The data has been extracted from World Bank. The format of the data was comma
delimited (.csv). As the whole analysis has been performed in R Studio, the data has been
imported to R from excel. There are 26 attributes in the data over 15 years from 2001 to
2015 and on the countries of East Asia and the Pacific. The necessity of the analysis has
been kept in mind and thus the data was extracted as per the needs. 2015 has been
eliminated from the data as there were a lot of missing information for that particular year.
Table 2.1 shows the R-Codes for data Extraction.
Various libraries have been used in R to run the analysis. These libraries are listed as
follows:
dplyr: This library is used for filtering the data
ggplot2: This is a library that is used to plot the data
reshape2: With the help of this library, the data can be reshaped
cluster: This library is used for the k-mean clustering analysis
Table 2.1: R-Codes for the Libraries Used and the Extraction of the Data
# Libraries necessary for analysis
library(dplyr)
library(ggplot2)
library(reshape2)
library(cluster)
# Extracting the data
data <- read.csv(file.choose(), sep = ”,”, header = TRUE, na.strings = “..”)
attach(data)
data <- filter(data, Country.Code == "NZL")
data <- subset(data, select=-c(Country.Name, Country.Code, ï..Series.Name,
X2015..YR2015.))
data <- na.omit(data)
data <- melt(data, Series.Code = "Country.Code")
3. Exploratory Data Analysis
1 | P a g e

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
World Health and Population Analysis
|11
|2363
|271

Study of Unemployment in Australia and New Zealand
|16
|3255
|280

Study of Life Expectancy in Australia and New Zealand
|17
|3727
|496

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries
|15
|2593
|166

Report on Demographic Analysis of East Asia and Pacific Countries
|19
|4001
|64

Maternal Health in Australia: Risk Factors and Analysis
|12
|1883
|486