logo

Descriptive Analytics And Visualisation Assignment 2022

   

Added on  2022-09-21

12 Pages1725 Words16 ViewsType: 16
Data Science and Big Data
 | 
 | 
 | 
DESCRIPTIVE
ANALYTICS AND
VISUALISATION
ASSIGNMENT ONE
STUDENT ID:
[Pick the date]
Descriptive Analytics And Visualisation Assignment 2022_1

Introduction
The objective of the given report is to present the findings of the data analysis of the provided
sample data of 200 people considering the questions raised in the email sent by Edmond
Kendrick. The analysis of the given sample data has been carried out using the appropriate
descriptive and inferential statistical tools so as to derive useful conclusions about the
population. It has been assumed that the data provided is random and representative of the
underlying target population.
Discussion
Based on the results obtained from the statistical analysis, the various claims highlighted in
the email have been addressed in the chronological order.
Question 1
a) The objective is to estimate if there is significant difference in the proportion of “MILD”or
“MEDIUM” by gender. In order to test this claim, the suitable hypothesis test would be
two sample mean proportion test. A two tail z test has been conducted. Table 1 in
Appendix 1 indicates that the null hypothesis would be rejected. This would indicate that
we can conclude with 95% confidence that the proportion of “MILD” or “MEDIUM”
claims differ significantly across the two genders (male and female).
b) The objective is to determine if the given sample data lends support to the claim that
representation of a claimant by private attorney leads to higher claim amount on average.
The appropriate hypothesis test to be conducted here would two sample independent t test.
A t test has been preferred ahead of z test as the standard deviation for the true population
of the two samples is unknown. A right tail t test has been conducted assuming unequal
variance for the two data sets. Table 2 in Appendix 1 indicates that the null hypothesis
would be rejected. This implies that the claim regarding higher claim amount on average
for those claimant who are represented by private attorney seems true.
c) The key objective is to determine the validity of the claim that private attorney
representation is higher for severe cases as compared to medium cases. A two sample
proportion test has been conducted. A right tail z test has been done here. Table 3 in
Appendix 1 indicates the failure to reject the null hypothesis at 5% level of significance.
Descriptive Analytics And Visualisation Assignment 2022_2

Hence, it can be concluded with 95% confidence that the industry claim regarding higher
representation of severe claims by private attorneys is not true.
Question 2
a) The claim to be tested is that the percentage of “SEVERE” claims for orthopaedic
surgeons ais lower in comparison to all other specialists. The two sample proportions test
has been used to test the above claim. A left tail z test has been performed on the sample
data. Table 1 in Appendix 2 indicates the the failure to reject the null hypothesis at 5%
level of significance. Hence, it can be concluded with 95% confidence that you claim
regarded lower percentage of “SEVERE” claims for orthopaedic surgeons is not supported
by the sample data.
b) The claim to be tested is that average claim amount for “SEVERE” claim is higher for
orthopaedic in comparison with other specialists. The appropriate hypothesis test to be
conducted here would two sample independent t test. A t test has been preferred ahead of z
test as the standard deviation for the true population of the two samples is unknown. A
right tail t test has been conducted assuming unequal variance for the two data sets. Table
2 in Appendix 2 indicates that the null hypothesis would not be rejected. This implies that
your assertion does not find support from the given sample data.
Question 3
a) The objective is to estimate if the claim amount tends to differ across the different marital
status of the claimants. Since there are four categories of marital status for the claimants,
hence t based approach is quite exhaustive. Hence, for each of these four categories, 95%
confidence interval for mean claim amount has been constructed. If there is no difference
in the average claim amount by marital status of claimants, then the confidence intervals
of atleast two categories would overlap. However, Table 1 of Appendix 3 indicates that
there is no overlap in the confidence interval of divorced and widowed. This implies that
average claim amount is driven by the marital status of claimants.
b) The objective is to estimate if the claim amount tends to differ across the different surgeon
specialities. Since there are four categories of surgeon specialists, hence t based approach
Descriptive Analytics And Visualisation Assignment 2022_3

is quite exhaustive. Hence, for each of these four categories, 95% confidence interval for
mean claim amount has been constructed. If there is no difference in the average claim
amount by surgeon specialists, then the confidence intervals of atleast two categories
would overlap. However, Table 2 of Appendix 3 indicates that there is overlap in the
confidence interval for all the four surgeon specialities. This implies that average claim
amount is not driven by the underlying surgeon specialities.
c) The objective is to estimate if the average proportion of claimants represented by private
attorney tends to differ across the different marital status of the claimants. Since there are
four categories of marital status for the claimants, hence t based approach is quite
exhaustive. Hence, for each of these four categories, 95% confidence interval for mean
claim amount has been constructed. If there is no difference in the average proportion by
marital status of claimants, then the confidence intervals of atleast two categories would
overlap. However, Table 3 of Appendix 3 indicates that there is overlap in the confidence
interval of all four categories. This implies that average proportion of claimants
represented by private attorney does not differ by marital status.
Question 4
A pivot table has been drawn which considers the impact of private attorney representation
and insurance on the amount claimed. Based on that pivot table obtained from the data in the
experiment tab, a graph has been obtained which is shown as Graph 1 in the Appendix 4.
Based on this graph, it is apparent that in the given sample data, for every category of
insurance, average claim amount is higher when there is private attorney representing the
claimant. Also, the claim amount tends to vary across the insurance type is the representation
by private attorney is not considered. Based on Table 1 in the Appendix 4, claimants having
Medicare/Medicaid have the highest claim on average while those availing private insurance
have the lowest claim on average.
Question 5
As a data analyst, it is important that deadlines are met so that the clients can get their
analysis in a timely manner. If this is not done, then the analysis may lose relevance
considering that information has assumed immense significance in the current business
environment. In order to ensure timely delivery, I plan each task in an organised manner by
Descriptive Analytics And Visualisation Assignment 2022_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Descriptive Analytics And Visualisation docx.
|12
|1582
|48

The true proportion of individuals
|3
|596
|14

Role of Central Limit Theorem
|8
|852
|16

Statistics Student | Interpretation
|5
|646
|24

STATISTICS 8. : STATISTICS 1.ssdfasdfdas
|7
|624
|1

Analysing your College Graduation Rate Number of Students = 200 Number of graduated students = 77%
|6
|765
|470