logo

Analysis of East Asia and Pacific Health and Population Data

Assignment 2 for the ICT110 Introduction to Data Science course, which involves studying the health development of the world in the past 15 years.

22 Pages3497 Words268 Views
   

Added on  2023-06-11

About This Document

This report represents the finding of the analysis on the data provided by the World Bank on the Health and Population in the East Asia and Pacific region. The analysis focuses on two main subsets in the data. These subsets are Immunization Data and Alcohol Consumption Data.

Analysis of East Asia and Pacific Health and Population Data

Assignment 2 for the ICT110 Introduction to Data Science course, which involves studying the health development of the world in the past 15 years.

   Added on 2023-06-11

ShareRelated Documents
Analysis of East Asia and Pacific Health and Population Data_1
TABLE OF CONTENTS
1. INTRODUCTION ................................................................................................. 1
2. DATA SETUP ....................................................................................................... 2
3. EXPLORATORY DATA ANALYSIS ....................................................................... 4
ONE VARIABLE ANALYISIS .................................................................... 4
TWO VARIABLE ANALYSIS .................................................................... 7
4. ADVANCED ANALYSIS ........................................................................................ 10
HIERARCHICAL AND K MEANS CLUSTERING ....................................... 10
LINEAR REGRESSION .............................................................................. 13
5. CONCLUSION ....................................................................................................... 18
6. REFLECTION ......................................................................................................... 19
REFERENCES .............................................................................................................. 20
Page | i
Analysis of East Asia and Pacific Health and Population Data_2
1. INTRODUCTION
This report represents the finding of the analysis on the data provided by the World
Bank on the Health and Population in the East Asia and Pacific region. The analysis focuses on
two main subsets in the data. These subsets are Immunization Data and Alcohol Consumption
Data.
Immunization is an important health care aspect in any country. This aspect determines
the survival rate of infants and children below five years of age. Polio, for instance, can develop
into a disease that affects an individual’s entire live. Investing in immunization drives, is
therefore a very important step in ensuring both the survival of the infants and children aged
five and below, and a healthy life. This report compares the immunization in countries in East
Asia and Pacific to obtain a view of the distribution of immunization among the populations in
East Asia and Pacific. This report also examines the relationship that exist between the
immunization and the amount of money allocated for health in the specific countries. The aim
for examining the relationship is to establish whether higher allocations for health mean a
higher allocation for immunization and if lower allocations for health mean lower allocation for
immunization. The findings will provide experts in the health sectors in the various countries
with information on the impact of investment in health to the immunization, information that
form a reliable basis for health planning and funding.
The report also relates alcohol consumption in East Asia and Pacific to the individual
income in each country. This comparison intends to establish whether the level of income has
any influence on the amount of alcohol consumption. This analysis first compares the alcohol
consumption and the Gross National Income (GNI) of the individual countries in East Asia and
Pacific. The comparisons will provide an understanding of the nature of each of the two
variables in the region. The outcome from this analysis will be relevant to researchers and
professionals interested in social data analysis.
Page | 1
Analysis of East Asia and Pacific Health and Population Data_3
2. DATA SETUP
The R Code that imported and loaded the Health and Population data, HealthAndPopulation.csv
into RStudio for analysis is as represented below
The structure and dimensions of the Health And Population Data were obtained using the str()
and dim() functions in R.
From the structure and dimensions analysis, the Health and Population Data has 967 entries
and 19 variables. The entries however are inclusive of five extra rows that at the end of the
data file that give information about the data file. Therefore, the correct number of entries is
962 with the variables correctly recorded as 19. The 19 variables in the dataset are factors with
the Country Name variable having 37 levels (Excluding the level representing the column
name). The 37 level represent the 37 countries whose data is in the dataset.
The preprocessing of the Health and Population Data involved the creation of two subsets from
the HealthAndPopulation.csv dataset. The resulting subsets were the Immunization Data.csv
and Alcohol Consumption Data.csv. The Alcohol Consumption Data subset contains the Country
Name, Alcohol Consumption and GNI variables for the year 2015. The Immunization Data
subset contains the Country Name, Immunization BCG and Health Expenditure Total variables
for the year 2014.
The preprocessing also involved the omitting of entries with missing values in the subsets using
the codes below for importing and preprocessing the subsets in R
Page | 2
#Importing the data for East Asia and Pacific as EAP
EAP <- read.csv("C:/Users/user/Documents/IDS/HealthAndPopulation.csv", header = T)
#Data Dimensions
dim(EAP)
#Data Structures
str(EAP)
summary(EAP)
Analysis of East Asia and Pacific Health and Population Data_4
The datasets were stored in the TacD and ImD data frames. The as.numeric function also
converted the Alcohol Consumption, GNI, Immunization and Health Expenditure variables from
factors.
Page | 3
#Data Preparation
#Specifying The Datasets for Analysis
#Alcohol Consumption Data (TacD)
TacD <- read.csv("C:/Users/user/Documents/IDS/Alcohol Consumption Data 2015.csv",
header = T)
TacD[TacD == ".."] <- NA
TacD <- na.omit(TacD)
TacD$Alcohol.Consumption <- as.numeric(TacD$Alcohol.Consumption)
TacD$GNI <- as.numeric(TacD$GNI)
TacD$Country.Name <- factor(TacD$Country.Name)
#Immunization Data (ImD)
ImD <- read.csv("C:/Users/user/Documents/IDS/Immunization Data 2014.csv", header =
T)
ImD[ImD == ".."] <- NA
ImD <- na.omit(ImD)
ImD$Immunization <- as.numeric(ImD$Immunization)
ImD$Health.Expenditure <- as.numeric(ImD$Health.Expenditure)
ImD$Country.Name <- factor(ImD$Country.Name)
Analysis of East Asia and Pacific Health and Population Data_5
3. EXPLORATORY DATA ANALYSIS
A. ONE VARIABLE ANALYSIS
I. ALCOHOL CONSUMPTION ANALYSIS FOR 2015
This analysis investigated the rates of alcohol consumption across the East Asia and Pacific
region. The R Code below plotted the results for this analysis
The analysis produced the boxplot graph in Figure 1 below
Figure 1
The results of the analysis in the plot above show Vietnam leading in the alcohol consumption
in East Asia and Pacific followed by Thailand, Mongolia and China. Indonesia has the lowest
alcohol consumption in the region.
Page | 4
#Total Alcohol Consumption Analysis
plot(TacD$Country.Name,TacD$Alcohol.Consumption,
ylab = "Alcohol Consumption",
main = "Alcohol Consumption in East Asia & Pacific (2015)")
Analysis of East Asia and Pacific Health and Population Data_6

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Introduction to Data Science: Analysis of Health and Development Conditions in New Zealand
|15
|3036
|355

Analytical Tools for Healthcare Firms
|21
|2934
|250

(PDF) Analysis of Research in Healthcare Data Analytics
|17
|4129
|83

World Health and Population Analysis
|11
|2363
|271

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries
|17
|3130
|360

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries
|20
|3160
|304