Visual Analytics Assignment 2022

Verified

Added on 2022/10/11

AI Summary

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Visual Analytics
Student Name:
Instructor Name:
Course Number:
25th September 2019

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Task 1:
Part 1:
In this section, we sought to analyze the trends in the disease proportions for the various health
conditions. There were a total of nine health conditions and the aim was to present a trend
analysis visualization plot that shows how the different health conditions compared in both 2001
and 2018. To perform analysis, ggplot function was used in r. The codes used to generate the plot
are presented below;
From the plot, it was observed that in 2001, three health
conditions (Arthritis, Asthma and back problems) had
proportions greater than 10% with back problems being the
condition with the highest proportion followed by arthritis and
asthma at the third place. In 2018, four health conditions
(arthritis, asthma, back problems and mental behavioural
problems) had proportions greater than 10% with mental
behavioural problems being the condition with the highest proportion followed by back
problems, arthritis and asthma in that order.
#task 1
##part 1
#Import data into R
health_data<-read.csv("C:\\
Users\\Downloads\\
aushealth.csv")
#Attach the data
attach(health_data)
#check the ctructure of the
imported data
str(raw_data)
#check the dimensions of
data
dim(health_data)
#View data

Figure 1: bar chart on Disease proportions in 2001 and 2008
Apart from visualizing on the trend of the disease conditions, we also sought to check on how the
diseases compare in terms of the change in proportions from 2001 to 2018. A chart showing
changes in the disease proportions was constructed. The code used to generate the plot is
provided below;
##part 2
#call the tidyverse library
library(tidyverse)
#converting the data to dide format
health_data_wide = health_data %>%
spread(year, prop_population)
#checking the structure of the dataframe
str(health_data_wide)
#changing the column names
colnames(health_data_wide)<- c("Disease","Year2001","Year2018")
#Computing the new variable on the percentage change
health_data_wide_new<-health_data_wide %>%
mutate(prop_change = ((Year2018-Year2001)/Year2001)*100)

From the graph presented below, it is evident that kidney disease has been on the rise recording
about 400% increase in 2018 compared to 2001. On the other hand, back problems recorded the
highest drop. In general only two health conditions recorded a drop in their proportions in 2018
when compared to 2001. These health conditions are the back problems and asthma.
Figure 2: Bar chart on change in proportion of health conditions from 2001 to 2018
Task 2:
Part 1:
In this task, we are presented with stock data. We are tasked with visualizing the returns for the
top 4 stocks (Apple, Amazon, Google and Facebook). The aim is identify the risky and non-risky
stocks. The code used to generate the graph is given below;

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

The graph below presents the stock returns for the big 4 stocks (Apple, Amazon, Google and
Facebook). The graph shows variation in the stock returns for the four stocks with Facebook
having the lowest stock returns while amazon having the highest stock returns and Apple having
the most varied stock returns.
Figure 3: Time series plot for the big 4 stocks
Next, we sought to compare the market shares for the four stocks based on their volumes. The
comparison was made for both 2013 and 2015. The R code used to generate the graph is given
below;
We observe that Apple had the highest volume in both 2013
and 2015 and it was followed by Facebook. The market share
for both Amazon and Google was very low in both 2013 and
2015.
##part 2
#calling the library dplyr
library(dplyr)
#calling the library tidyr
library(tidyr)
#calling the library plyr
library(plyr)
big4stocks_vol<-
reshape(stock_data,
v.names="volume",
timevar="company",
idvar=c("date"),
direction="
wide")
big4stocks_vol$Date =
as.Date(big4stocks_vol$da

Figure 4: Bar chart on market shares of the big 4 stocks
Task 3:
The first part of this task sought to visualize the top 10 tennis players. The code used to generate
the results is given below;
The graphical representation is given below and from the graph it
can be seen that the best player of the year was Roger Federer
who had about 19 wins. He was closely followed by Rafael
Nadal with about 16 wins and Pete Sampras with 14 wins. Fred
Perry closed the list of the top 10 players in that year.
##part 1
grandslamdata<-
read.csv("C:\\Users\\
310187796\\Downloads\\
grandslamdata.csv")
str(grandslamdata)
attach(grandslamdata)
library("viridis")
library(tidyverse)
grandslamdata %>%
group_by(winner) %>%
summarise(n()) %>%
top_n(10)

Figure 5: Bar chart of the top 10 tennis players
In the second part, we sought to compare the winners versus the tournaments that they won. The
idea was to identify which of the tournaments was won the most.
The code used to generate the graph is given below;
From the graph below, we can see that the most commonly won
tournament by the top 10 players is the Australian open and it
was closely followed by the French Open tournament.
#Comparative bar chart
Names <- c(rep(c("Roger
F.", "Rafael N.",
"Pete S.",
"Novak D.",
"Roy E.",
"Bjorn B.",
"Rod L.",
"William T.",
"Andre A.",
"Fred P.",

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Figure 6: Bar chart of winners versus the tournament won
Task 4:
In this task, we present the spatial plots on the crime rates in Houston. We considered burglary as
the crime of interest and try to analyze the spatial density plot for this particular crime.
The r code for generating this plot is given below;
As can be seen from the plot, there is irregular distribution of the
burglary cases in Houston with majority of the crimes being
reported in the southern part.
## part 1
crime_data<-
read.csv("C:\\Users\\
310187796\\Downloads\\
crimedata.csv")
str(crime_data)
library("shiny")
library(ggmap)
library("ggplot2")
library("ggmap")
burglary <-
subset(crime_data, offense
== "burglary")
qmplot(lon, lat, data =
burglary, colour = I('red'),
size = I(2), darken = .3)

Figure 7: Spatial plot of the burglary cases in Houston
In the second part, we considered all the crimes and no particular crime in mind. We also
considered the data from 1st March 2010 to 31st May 2010. The code for generating the graph is
given below;
The spatial density plot for all the crimes in Houston for the
period 1st March 2010 to 31st May 2010 is presented below.
crime_data$date <-
as.Date(crime_data$date)
crime_data_new=with(cri
me_data, crime_data[(date
>= "1/3/2010" & date <=
"31/5/2010"),])
myvars <- c("offense",
"lon", "lat")
all_crimes <-
crime_data_new[myvars]
str(all_crimes)
qmplot(lon, lat, data =
all_crimes, colour =
I('green'), size = I(2),
darken = .3)

Figure 8: Spatial density plot of all the crimes in Houston
The spatial density plot above shows that the crime rates are more concentrated in the southern
part of the region with very few if not none cases reported in the northern part.

1 out of 10

+13062052269

info@desklib.com

Visual Analytics Assignment 2022

Contribute Materials

Secure Best Marks with AI Grader

Secure Best Marks with AI Grader

Paraphrase This Document

Related Documents

Visual Analytics Tasks 2022