logo

Introduction to Statistical Learning: Problem Set 1 Solutions

   

Added on  2023-06-10

10 Pages1809 Words365 Views
STATS216v Introduction to Statistical Learning
Stanford University, Summer 2018
Problem Set 1
Question 1:
1: (a)
This is supervised learning model. It is example of regression. We are interested in prediction.
We predict the most promising spot to dig. Here number of observations n = 80 and number of
predictors p = 24.
1: (b)
This is supervised learning model. It is example of classification. Here we are interested in
prediction. We predict the whether to display advertisement A or advertisement B to each
customer. Here number of observations n = 300 and number of predictors p = 3 (age, zip code,
and gender).
1: (c)
This is supervised learning model. It is example of regression. Here we are interested in
inference. We are interested in discovering factors that are associated with the unemployment
rate across different U.S. cities. Here number of observations n = 400 and number of predictors p
= 6 (the population, state, average income, crime rate, percentage of students who graduate high
school and unemployment level.).
1: (d)
This is unsupervised learning model. For the each students we don’t have responses (different
subtypes) of students in the application pool.

1 (e):
This is supervised learning model. It is example of classification. Here we are interested in
prediction. We predict the type of cells based on a few measurements. Here number of
observations n = 68 and number of predictors p = 3 (the number of branch points, the number of
active processes, and the average process length).
Question 2:
2 (a):
We preferred inflexible regression model as number of predictors (number of genes) p is
extremely large, and the number of observations (number of patients) n is small.
2 (b):
We preferred flexible regression model as number of predictors (math, science and history
grades in the 7th grade) p is small, and the number of observations (number of students) n is
extremely large.
2 (c):
We preferred inflexible regression model as we variation in data is more.
2 (d):
We preferred flexible regression model as we variation in data is less. Flexible model will
perform better to find non-linear effect also.

Question 3:
3 (a):
Flexible model performs better. A flexible models fits the data well with larger sample size and
performs better than inflexible model.
3 (b):
Flexible model performs worse. A flexible models overfit small number of observation.
3 (c):
Flexible model performs worse. A flexible method would fit to the noise in the error terms and
increase variance.
3 (d):
Flexible model performs better. A flexible models gets more degrees of freedom fits the data
well.
3 (e):
Flexible model performs worse as it would fit to the noise in the error terms and increase
variance.

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Classification Methods in Machine Learning
|10
|1387
|220

Gnosis Data Analysis Jorurnal
|9
|2492
|20

Understanding Regression Terminology and Simple/Multiple Linear Regression
|6
|796
|151

Change in Unemployment Rate Question Answer 2022
|14
|2093
|36

Homework 4 (Individual Exercise)
|7
|1226
|81