ISYS3375 Business Analytics: Assignment Solution and Analysis
VerifiedAdded on 2019/09/24
|3
|1421
|257
Homework Assignment
AI Summary
This document provides a comprehensive solution to a Business Analytics assignment (ISYS3375) focusing on various data analysis techniques. The assignment covers topics including handling imbalanced data in classification, overfitting, and applications of logistic regression. It delves into quantitative questions involving k-means clustering, scatter plots, simple and polynomial regression models, and multiple regression models to assess factors influencing diabetes risk. Furthermore, the solution addresses logistic regression models for customer behavior analysis and builds a model to calculate the profit of a European put option using data tables. The assignment requires the use of Excel for data analysis and the presentation of results in a Word document.

ISYS3375 Business Analytics
Note: You need to submit your answers in a word document. You need to transfer the results from the
excel file into the word document. In addition, you must submit your Excel files but note that only the
word document will be marked.
The results of running analytics platform solver have been provided in the Excel file. Then you don’t need
to run analytics platform solver, you need to answer the questions based on the results provided to you in
various worksheet of the excel files.
SECTION A: DISCUSSION QUESTIONS
1- Explain the concept of having the imbalance data in classification techniques and the way that
it should be treated in developing the classification models?
2- Explain the concept of over-fitting. Explain how overfitting can be avoided?
3- Give two examples of how logistics regression can be used. You only need to explain the
problem. One example is the bank that are using logistic regression to classify its new
customers for loan approval. The bank wanted to identify customers that are more likely to
default on their loan. Explain why you cannot use linear regression in your examples.
(5+4+6 = 15 marks)
SECTION B: QUANTITATIVE QUESTIONS
1. There are 500 client records in the first sheet of the file Toy-Info which have shopped many
special toys from an e-Business website. Each record includes data on types of product
purchased (between 1-5), purchase amount ($), age, gender, marital status, whether the client
has a membership and whether the customer has a discount card.
A business analyst has applied the k-means clustering method on all seven variables. The
analyst increased the number of clusters to recommend a proper value of k. The resultant tests
for k=5 and k=6 shown in the following sheets of the file revealed the best k as k=6.
a) Explain how the analyst found that k=6 is a proper number of clusters. Refer the
relevant sheet name, table name and the values you compared.
b) Describe all 6 clusters by their average characteristics.
(5+5=10 marks)
2- In order to improve the overall quality of a new material, a chemist experiments with the effect
of two indices (A & B) on each other in her laboratory. In the following table, the values of these
indices have been captured for each experiment:
No. of experiment Index A Index B
1 248 29915
2 247 29915
3 247 29991
4 253 29807
5 251 29965
6 230 29620
7 232 29526
8 237 29383
9 233 29345
10 242 29711
11 242 29570
12 245 29822
Note: You need to submit your answers in a word document. You need to transfer the results from the
excel file into the word document. In addition, you must submit your Excel files but note that only the
word document will be marked.
The results of running analytics platform solver have been provided in the Excel file. Then you don’t need
to run analytics platform solver, you need to answer the questions based on the results provided to you in
various worksheet of the excel files.
SECTION A: DISCUSSION QUESTIONS
1- Explain the concept of having the imbalance data in classification techniques and the way that
it should be treated in developing the classification models?
2- Explain the concept of over-fitting. Explain how overfitting can be avoided?
3- Give two examples of how logistics regression can be used. You only need to explain the
problem. One example is the bank that are using logistic regression to classify its new
customers for loan approval. The bank wanted to identify customers that are more likely to
default on their loan. Explain why you cannot use linear regression in your examples.
(5+4+6 = 15 marks)
SECTION B: QUANTITATIVE QUESTIONS
1. There are 500 client records in the first sheet of the file Toy-Info which have shopped many
special toys from an e-Business website. Each record includes data on types of product
purchased (between 1-5), purchase amount ($), age, gender, marital status, whether the client
has a membership and whether the customer has a discount card.
A business analyst has applied the k-means clustering method on all seven variables. The
analyst increased the number of clusters to recommend a proper value of k. The resultant tests
for k=5 and k=6 shown in the following sheets of the file revealed the best k as k=6.
a) Explain how the analyst found that k=6 is a proper number of clusters. Refer the
relevant sheet name, table name and the values you compared.
b) Describe all 6 clusters by their average characteristics.
(5+5=10 marks)
2- In order to improve the overall quality of a new material, a chemist experiments with the effect
of two indices (A & B) on each other in her laboratory. In the following table, the values of these
indices have been captured for each experiment:
No. of experiment Index A Index B
1 248 29915
2 247 29915
3 247 29991
4 253 29807
5 251 29965
6 230 29620
7 232 29526
8 237 29383
9 233 29345
10 242 29711
11 242 29570
12 245 29822
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

ISYS3375 Business Analytics
a) Plot a scatter chart for this data where index A is the independent variable. What does the
scatter chart indicate about the relationship between indices A and B? How strong is the
relationship? Create an estimated simple regression model and write the equation?
b) Apply and investigate a polynomial regression model that includes intercept and terms x, x^2
and x^3. Is this new model statistically significant? Is this new model better than the linear
model, explain? (9+8 = 17 marks)
3- The following data is the results of a 4- year study conducted to assess how age, weight, and
gender influence the risk of diabetes. Risk is interpreted as the probability (times 100) that the
patient will have diabetes over the next 4-year period.
a) Develop a multiple regression model that relates risk of diabetes to the person’s age, weight
and the gender. Present the regression formula as a mathematical equation. Interpret the
coefficients of the regression and comment on the strength of the regression.
b) Develop an estimated multiple regression model that relates risk of diabetes to the person’s
age, weight, gender and life style. Present the regression formula as a mathematical equation.
Interpret the coefficients of the regression and comment on the strength of the regression.
c) What is the risk percentage of diabetes over the next 4 years for a 55-year-old man living in a
big city with 70 kg weight?
Age Weight (Kg) Gender Life style Risk
(%)
53 78 Female Small town 40
24 77 Male Big city 23
77 83 Female Country 67
88 89 Female Small town 71
56 65 Male Big city 45
71 82 Female Country 54
53 79 Female Small town 48
70 66 Male Small town 49
80 80 Female Big city 65
78 67 Male Big city 59
71 69 Male Big city 56
70 78 Female Small town 59
67 75 Male Country 46
77 95 Female Big city 64
60 57 Male Country 39
82 100 Female Big city 73
66 85 Male Small town 63
80 96 Male Big city 87
62 83 Female Country 52
59 93 Male Big city 61
(6 +6+6= 18 marks)
4- An internet provider company in Australia is interested in identifying the reason for individuals
who are still undecided in buying the new NBN service of the company. The file NBN-service
contains data on the first sheet which introduces a sample of customers with variables that
tracked the decision outcome.
A business analyst has created a standard partition of the data with all tracked variables and
40% of observations in the training set, 35% in the validation set, and 25% in the test set. The
analyst applied two logistic regression models to classify undecided customers of the company.
a) Plot a scatter chart for this data where index A is the independent variable. What does the
scatter chart indicate about the relationship between indices A and B? How strong is the
relationship? Create an estimated simple regression model and write the equation?
b) Apply and investigate a polynomial regression model that includes intercept and terms x, x^2
and x^3. Is this new model statistically significant? Is this new model better than the linear
model, explain? (9+8 = 17 marks)
3- The following data is the results of a 4- year study conducted to assess how age, weight, and
gender influence the risk of diabetes. Risk is interpreted as the probability (times 100) that the
patient will have diabetes over the next 4-year period.
a) Develop a multiple regression model that relates risk of diabetes to the person’s age, weight
and the gender. Present the regression formula as a mathematical equation. Interpret the
coefficients of the regression and comment on the strength of the regression.
b) Develop an estimated multiple regression model that relates risk of diabetes to the person’s
age, weight, gender and life style. Present the regression formula as a mathematical equation.
Interpret the coefficients of the regression and comment on the strength of the regression.
c) What is the risk percentage of diabetes over the next 4 years for a 55-year-old man living in a
big city with 70 kg weight?
Age Weight (Kg) Gender Life style Risk
(%)
53 78 Female Small town 40
24 77 Male Big city 23
77 83 Female Country 67
88 89 Female Small town 71
56 65 Male Big city 45
71 82 Female Country 54
53 79 Female Small town 48
70 66 Male Small town 49
80 80 Female Big city 65
78 67 Male Big city 59
71 69 Male Big city 56
70 78 Female Small town 59
67 75 Male Country 46
77 95 Female Big city 64
60 57 Male Country 39
82 100 Female Big city 73
66 85 Male Small town 63
80 96 Male Big city 87
62 83 Female Country 52
59 93 Male Big city 61
(6 +6+6= 18 marks)
4- An internet provider company in Australia is interested in identifying the reason for individuals
who are still undecided in buying the new NBN service of the company. The file NBN-service
contains data on the first sheet which introduces a sample of customers with variables that
tracked the decision outcome.
A business analyst has created a standard partition of the data with all tracked variables and
40% of observations in the training set, 35% in the validation set, and 25% in the test set. The
analyst applied two logistic regression models to classify undecided customers of the company.

ISYS3375 Business Analytics
The resultant output of the Solver software for both models has been added in the following
sheets.
a) Determine the selected input variables in each model and explain why the analyst has changed
one of the input variables.
b) Write the obtained logistic regression equation for the first model shown in worksheet “4-1-1”
and predict a customer with Contract duration of 16 months, Bonus data of 63 GB and Usage
of 237 GB whether he/she will decide to buy the new service or not? Explain how you found
the prediction.
c) Find the class 1 and class 0 errors based on the sheet “4-1-2” and compare your results with the
confusion matrix. Explain which kind of these errors are more undesirable in this model?
d) In the second model (shown in worksheet “4-2-1”), compare the accuracy of the model with
the first model. Which one do you recommend?
(3+6+7+8 = 24 marks)
5- A put option in finance allows you to sell a share of stock in the future at a given price. There
are different types of put options. A European put option allows you to sell a share of stock at a
given price (called the exercise price) at a particular point in time after the purchase of the
option. For example, suppose you purchase an eight-month European put option for a share of
stock with an exercise price of $29. If eight months later, the stock price per share is $29 or
more, the option has no value. If in six months time the stock price is lower than $29 per share,
then you can purchase the stock and immediately sell it at the higher exercise price of $29. If
the price per share in eight months is $26.4, you can purchase a share of the stock for $26.4
and then use the put option to immediately sell the share for $29. Your profit would be the
difference, $29-$26.4 = $2.6 per share, less the cost of the option. If you paid $1.5 per put
option, then your profit would be $2.6-$1.5=$1.1 per share.
a) Build a model to calculate the profit of this European put option.
b) Construct a data table that shows the profit per share for a share price in eight months between
$15 and $35 per share in increments of $1. (8+8 = 16 marks)
TOTAL MARKS= 100
The resultant output of the Solver software for both models has been added in the following
sheets.
a) Determine the selected input variables in each model and explain why the analyst has changed
one of the input variables.
b) Write the obtained logistic regression equation for the first model shown in worksheet “4-1-1”
and predict a customer with Contract duration of 16 months, Bonus data of 63 GB and Usage
of 237 GB whether he/she will decide to buy the new service or not? Explain how you found
the prediction.
c) Find the class 1 and class 0 errors based on the sheet “4-1-2” and compare your results with the
confusion matrix. Explain which kind of these errors are more undesirable in this model?
d) In the second model (shown in worksheet “4-2-1”), compare the accuracy of the model with
the first model. Which one do you recommend?
(3+6+7+8 = 24 marks)
5- A put option in finance allows you to sell a share of stock in the future at a given price. There
are different types of put options. A European put option allows you to sell a share of stock at a
given price (called the exercise price) at a particular point in time after the purchase of the
option. For example, suppose you purchase an eight-month European put option for a share of
stock with an exercise price of $29. If eight months later, the stock price per share is $29 or
more, the option has no value. If in six months time the stock price is lower than $29 per share,
then you can purchase the stock and immediately sell it at the higher exercise price of $29. If
the price per share in eight months is $26.4, you can purchase a share of the stock for $26.4
and then use the put option to immediately sell the share for $29. Your profit would be the
difference, $29-$26.4 = $2.6 per share, less the cost of the option. If you paid $1.5 per put
option, then your profit would be $2.6-$1.5=$1.1 per share.
a) Build a model to calculate the profit of this European put option.
b) Construct a data table that shows the profit per share for a share price in eight months between
$15 and $35 per share in increments of $1. (8+8 = 16 marks)
TOTAL MARKS= 100
1 out of 3

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.