logo

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries

15 Pages2593 Words166 Views
   

Added on  2023-06-11

About This Document

This report analyses the vital statistics of East Asia and Pacific region from 2001 to 2015. The report includes one-variable and two-variable analysis, clustering and linear regression to improve the health of the region. The report concludes that there is high level of GNI per capita for the country having country code MAC and there are outliers in the distribution of Gross national income if analysed country wise.

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries

   Added on 2023-06-11

ShareRelated Documents
Data Analysis Report of the Health and Population Statistics of East Asian And Pacific Countries
Name of the Student:
1 | P a g e
Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries_1
Table of Contents
1 Introduction............................................................................................................................... 3
1.1 Authorisation and Purpose..............................................................................................3
1.2 Limitations.......................................................................................................................3
1.3 Scope............................................................................................................................... 3
2 Data Setup................................................................................................................................3
3 Exploratory Data Analysis...........................................................................................................4
3.1. One variable analysis.......................................................................................................... 4
3.1.1 One Variable Analysis – 1..................................................................................................4
3.1.2 One Variable Analysis – 2..................................................................................................5
3.1.3 One Variable Analysis – 3..................................................................................................6
3.2 Two-variable analysis...........................................................................................................7
3.2.1 Two-variable analysis 1.....................................................................................................7
3.2.2 Two-variable analysis 2.....................................................................................................8
4 Advanced analysis......................................................................................................................9
4.1 Clustering............................................................................................................................. 9
4.1.1 Brief explanation of k-means and clustering.....................................................................9
4.1.2 Clustering Analysis............................................................................................................ 9
4.2 Linear regression................................................................................................................11
4.2.1 Brief definition of linear regression................................................................................11
4.2.2 Linear Regression 1.........................................................................................................11
4.2.3 Linear Regression 2.........................................................................................................12
5 Conclusion.............................................................................................................................. 14
6 Reflection................................................................................................................................ 14
References.................................................................................................................................15
2 | P a g e
Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries_2
1 Introduction
1.1 Authorisation and Purpose
The study aims to analyse the vital statistics of East Asia and Pacific region from the year
2001 to the year 2015 by collecting the dataset from the World Bank. The implication of this
analysis will be done by government planners to improve the health of that region.
1.2 Limitations
The primary constraint of this study is that the research and analysis is limited for East Asia
and Pacific region only. Moreover, the data is collected from World Bank which is secondary of
nature and it is another limitation.
1.3 Scope
The present study consists of 26 attributes holding information about health of the above
mentioned region. Besides, the data contains information for a long time period of 15 years.
The analysis can be performed using statistical analysis and interpreting the graphs. However,
the data has lots of missing observations.
The analysis is proceeded through one-variable analyses, two-variable analyses. On the next
step, the data is clustered using k-means clustering technique and finally, the data is analysed
by fitting linear regression lines between two attributes.
1.4 Methodology
The information has been generated from World Bank. The dataset is quantitative in nature
and contains information about health for the time period of 2001 to 2015.
2 Data Setup
The data is loaded into the “R” program before the analysis. A pop-up window gets opened
after running the first line of the code and then the data file (in csv format) is selected by
inputting the location of the data file. The missing values are addressed in the first line of the
code as missing values.
At the second step, the necessary library files are loaded to the “R” program to perform the
required statistical analyses and to display all the graphical presentations.
3 | P a g e
Health0 <- read.csv(file.choose(), header = TRUE, sep = "," , na.strings = "..")
# Loading required library files
library(data.table)
library(reshape2)
library(psych)
library(ggplot2)
library(lattice)
library(dplyr)
Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries_3
3 Exploratory Data Analysis
3.1. One variable analysis
3.1.1 One Variable Analysis – 1
The per capita gross national income (GNI) is analysed under the section of one-variable
study. The average amount of GNI per capita in that region is 11522.45 and the standard
deviation is 15406. The minimum amount and the maximum percentage of immunized one-
year old children is 310 and 76300 respectively. The boxplot analysis shows that the dataset has
many outliers.
4 | P a g e
jpeg("Plot1.jpeg")
fill <- "cyan"
Plot1 <- ggplot(Health1, aes(x = factor(0), y = value)) + geom_boxplot(fill = fill)
Plot1 <- Plot1 + xlab("GNI per Capita") + scale_x_discrete(breaks = NULL) + coord_flip()
Plot1 <- Plot1 + ggtitle("Distribution of GNI per Capita") + theme_bw()
print(Plot1)
dev.off()
Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries
|17
|3130
|360

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries
|20
|3160
|304

(PDF) Analysis of Research in Healthcare Data Analytics
|17
|4129
|83

Data Analysis Report of Fatalities in Australian Road Accidents
|12
|2212
|59

Report on Demographic Analysis of East Asia and Pacific Countries
|19
|4001
|64

Introduction to Data Science
|11
|3150
|22