Data Analysis and Mixed Type Variables

Verified

Added on  2020/05/28

|7
|1178
|211
AI Summary
This assignment focuses on understanding different data types – nominal, ordinal, symmetric binary, and asymmetric binary. Students are tasked with identifying five attributes from a workplace dataset, classifying them by type, and providing example values. They then calculate the dissimilarity between two chosen instances using a mixed-type distance calculation method.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Data Analysis & Decision Making
Student Name:
University

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question-1(a)
The scenario for my work place is comparing and trying to minimize cost while taking keen note
of the quality of services offered. As an operations manager in a hospital I have two drugs; drug
A and drug B. The cost of drug B is twice that of drug A i.e. Drug A costs $2,000 while drug B
costs $4,000. Choosing either of the drugs has a chance of adverse effects however, the
likelihood of adverse effects using drug A is twice as high as that of drug B (assume the
probability of adverse effects for drug A is 0.10 while that of drug B is 0.05). If drug A is used,
the probability of the patient responding either after experiencing adverse effects or where there
are no adverse effects is 0.90 while it is 0.95 when drug B is used.
Drug Type Drug A Drug B
Cost $2,000 $4,000
Probability Adverse Effects 0.10 0.05
No Adverse Effects 0.90 0.95
Probability on outcome Respond 0.10 0.05
Fail 0.10 0.95
Document Page
Question-1(b)
Decision Tree for the problem:
The tree diagram gives a presentation of the scenario in 1 a). As can be seen from the tree
diagram, the probability of responding after adverse effects for drug A is 0.90 while that of
failing is 0.10. The same is seen for a case of no adverse effects. On contrary, the probability of
responding after adverse effects for drug B is 0.95 while that of failing is 0.05. The same is seen
for a case of no adverse effects
Question-2 (10 marks): Consider Table 1 consisting of survey data (in percentage) in a country
from drivers on mobile phone usage while driving. Is there any association between gender and
whether a driver uses a mobile phone? Show your solution step-by-step with the required details
of each step. Also, clearly state the type of the test (parametric or non-parametric), and the
name of the test performed in your solution.
Document Page
Note: Based on the last step of your analysis, please state in appropriate words, the finding of
your test inside the box provided on the next page.
Table 1: Survey data from drivers on mobile phone usage while driving
Driver Gender Type of phone used
Hand-held Hands-free Neither Total
% % % Count
Male 1.3 0.4 98.3 10068
Female 0.5 0.4 99.1 6976
Solution
Since data is given in percentage instead of counts, the percentage needs to be converted to
counts first, as follows (rounded up values):
Driver Gender Type of phone used Total
Hand-held Hands-free Neither
Male 131 40 9897 10068
Female 35 28 6913 6976
Total 166 68 16810 17044
Step 1: State the hypotheses and select an α level.
H0 : There is no association between gender and whether a driver uses a mobile phone.
H1 : There is an association between gender and whether a driver uses a mobile phone.
α = 0.05 (assumed).
A non-parametric test will be used to test the hypothesis. The test is a Chi-Square of
independence.
Step 2: Locate the critical region(s).
Since it is a non-parametric test i.e. it is not based on any population parameters such as mean or
proportions, a non-parametric test such as a chi-square test may be performed.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Note: A One-tailed test will be used.
Step 3: Compute the statistic. ( OE ) 2
E
i) Calculating row and column totals.
Driver Gender Type of phone used Total
Hand-held Hands-free Neither
Male 131 40 9897 10068
Female 35 28 6913 6976
Total 166 68 16810 17044
ii) Calculating chi-square test statistics.
O E (O-E) ( OE ) 2 ( OE )2
E
131 98.057 32.943 1085.224 11.067
40 40.168 -0.168 0.028 0.001
9897 9929.775 -32.775 1074.181 0.108
35 67.943 -32.943 1085.224 15.973
28 27.832 0.168 0.028 0.001
6913 6880.225 32.775 1074.181 0.156
Sum 27.306
Step 4: Decide whether to accept or reject H0.
The sum of chi-square values is 27.306. The degrees of freedom is:
(number of columns – 1) x (number of rows – 1) = 2 x 1 = 2
As per chi-square distribution table, the critical value for 2 degrees of freedom at 5%
significance level is 5.99.
Document Page
Since our test statistic (chi-square value) 27.306 > 5.99, it lies inside the critical region and
therefore, H0 can be rejected at 5% significance level.
Conclusion: There is an association between gender and whether a driver uses a mobile phone at
5% level of significance, and the result is found to be significant.
Question-3: Considering the data used in the processes at your work place, identify at
least five attributes with mixed types from ordinal, nominal, symmetric binary,
and asymmetric binary. You may choose any combination of them, but there must be at least
three types in your data chosen from the data types given above.
(a) Identify the five attributes that you have chosen, their types and the possible values that
these attributes may hold. In case of an ordinal variable (if you’ve chosen this data
type), also provide the sequence of the possible values. For asymmetric binary
variable (if you’ve chosen this data type), identify the preferred values. (4 marks)
Solution
Some of the data used or rather generated at work are as follows;
Nominal; common examples include eye color, race or ethnicity
Ordinal; ranking scales and sometimes when age is presented in a grouped manner
Symmetric binary; gender is an example of symmetric binary variable
(b) Show your data inside a table with attribute names as columns, and the data instances as
rows. Show only 5 rows. (2 mark)
(c) Calculate the
dissimilarity
ID Nominal
(Race)
Ordinal
(Rate the drug)
Symmetric binary
(Gender)
1 White Strongly effective Male
2 White Effective Male
3 White Neutral Female
4 Black Strongly ineffective Female
5 Black Ineffective Female
Document Page
between any two instances, using the mixed type distance calculation method. Choose any two
instances from your data table from part b of this question. (4 marks)
Solution
d ( xi , x j ) =

n =1
p
δi j
(n) di j
(n)

n=1
p
δi j
(n)
Object j
Object i 1 0 sum
1 a b a+b
0 c d c+d
sum a+c b+d P
d (i , j)= b+c
a+b+c+d
d ( 1 ,2 ) =1+1+0
4 =0.25
d ( 1 ,3 ) =2+1+ 0
4 =0.75
ID Nominal
(Race)
Ordinal
(Rate the drug)
Symmetric binary
(Gender)
1 1 1 1
2 1 2 1
3 1 3 0
4 0 5 0
5 0 4 0
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]