Task 1: Part 1: In this section, we sought to analyze the trends in the disease proportions for the various health conditions. There were a total of nine health conditions and the aim was to present a trend analysis visualization plot that shows how the different health conditions compared in both 2001 and 2018. To perform analysis, ggplot function was used in r. The codes used to generate the plot are presented below; Fromtheplot,itwasobservedthatin2001,threehealth conditions(Arthritis,Asthmaandbackproblems)had proportions greater than 10% with back problems being the condition with the highest proportion followed by arthritis and asthmaatthethirdplace.In2018,fourhealthconditions (arthritis,asthma,backproblemsandmentalbehavioural problems)hadproportionsgreaterthan10%withmental behaviouralproblemsbeingtheconditionwiththehighestproportionfollowedbyback problems, arthritis and asthma in that order. #task 1 ##part 1 #Import data into R health_data<-read.csv("C:\\ Users\\Downloads\\ aushealth.csv") #Attach the data attach(health_data) #check the ctructure of the imported data str(raw_data) #check the dimensions of data dim(health_data) #View data
Figure1: bar chart on Disease proportions in 2001 and 2008 Apart from visualizing on the trend of the disease conditions, we also sought to check on how the diseases compare in terms of the change in proportions from 2001 to 2018. A chart showing changes in the disease proportions was constructed. The code used to generate the plot is provided below; ##part 2 #call the tidyverse library library(tidyverse) #converting the data to dide format health_data_wide = health_data %>% spread(year, prop_population) #checking the structure of the dataframe str(health_data_wide) #changing the column names colnames(health_data_wide)<- c("Disease","Year2001","Year2018") #Computing the new variable on the percentage change health_data_wide_new<-health_data_wide %>% mutate(prop_change = ((Year2018-Year2001)/Year2001)*100)
From the graph presented below, it is evident that kidney disease has been on the rise recording about 400% increase in 2018 compared to 2001. On the other hand, back problems recorded the highest drop. In general only two health conditions recorded a drop in their proportions in 2018 when compared to 2001. These health conditions are the back problems and asthma. Figure2: Bar chart on change in proportion of health conditions from 2001 to 2018 Task 2: Part 1: In this task, we are presented with stock data. We are tasked with visualizing the returns for the top 4 stocks (Apple, Amazon, Google and Facebook). The aim is identify the risky and non-risky stocks. The code used to generate the graph is given below;
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
The graph below presents the stock returns for the big 4 stocks (Apple, Amazon, Google and Facebook). The graph shows variation in the stock returns for the four stocks with Facebook having the lowest stock returns while amazon having the highest stock returns and Apple having the most varied stock returns. Figure3: Time series plot for the big 4 stocks Next, we sought to compare the market shares for the four stocks based on their volumes. The comparison was made for both 2013 and 2015. The R code used to generate the graph is given below; We observe that Apple had the highest volume in both 2013 and 2015 and it was followed by Facebook. The market share for both Amazon and Google was very low in both 2013 and 2015. ##part 2 #calling the library dplyr library(dplyr) #calling the library tidyr library(tidyr) #calling the library plyr library(plyr) big4stocks_vol<- reshape(stock_data, v.names="volume", timevar="company", idvar=c("date"), direction=" wide") big4stocks_vol$Date= as.Date(big4stocks_vol$da
Figure4: Bar chart on market shares of the big 4 stocks Task 3: The first part of this task sought to visualize the top 10 tennis players. The code used to generate the results is given below; The graphical representation is given below and from the graph it can be seen that the best player of the year was Roger Federer who had about 19 wins. He was closely followed by Rafael Nadal with about 16 wins and Pete Sampras with 14 wins. Fred Perry closed the list of the top 10 players in that year. ##part 1 grandslamdata<- read.csv("C:\\Users\\ 310187796\\Downloads\\ grandslamdata.csv") str(grandslamdata) attach(grandslamdata) library("viridis") library(tidyverse) grandslamdata %>% group_by(winner) %>% summarise(n()) %>% top_n(10)
Figure5: Bar chart of the top 10 tennis players In the second part, we sought to compare the winners versus the tournaments that they won. The idea was to identify which of the tournaments was won the most. The code used to generate the graph is given below; From the graph below, we can see that the most commonly won tournament by the top 10 players is the Australian open and it was closely followed by the French Open tournament. #Comparative bar chart Names <- c(rep(c("Roger F.","RafaelN.", "PeteS.", "Novak D.", "Roy E.", "BjornB.", "RodL.", "WilliamT.", "Andre A.", "Fred P.",
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Figure6: Bar chart of winners versus the tournament won Task 4: In this task, we present the spatial plots on the crime rates in Houston. We considered burglary as the crime of interest and try to analyze the spatial density plot for this particular crime. The r code for generating this plot is given below; As can be seen from the plot, there is irregular distribution of the burglary cases in Houston with majority of the crimes being reported in the southern part. ## part 1 crime_data<- read.csv("C:\\Users\\ 310187796\\Downloads\\ crimedata.csv") str(crime_data) library("shiny") library(ggmap) library("ggplot2") library("ggmap") burglary<- subset(crime_data, offense == "burglary") qmplot(lon,lat,data= burglary, colour = I('red'), size = I(2), darken = .3)
Figure7: Spatial plot of the burglary cases in Houston In the second part, we considered all the crimes and no particular crime in mind. We also considered the data from 1stMarch 2010 to 31stMay 2010. The code for generating the graph is given below; The spatial density plot for all the crimes in Houston for the period 1stMarch 2010 to 31stMay 2010 is presented below. crime_data$date<- as.Date(crime_data$date) crime_data_new=with(cri me_data, crime_data[(date >= "1/3/2010" & date <= "31/5/2010"),]) myvars<-c("offense", "lon", "lat") all_crimes<- crime_data_new[myvars] str(all_crimes) qmplot(lon,lat,data= all_crimes,colour= I('green'),size=I(2), darken = .3)
Figure8: Spatial density plot of all the crimes in Houston The spatial density plot above shows that the crime rates are more concentrated in the southern part of the region with very few if not none cases reported in the northern part.