logo

R Programming Analysis 2022

Perform exploratory data analysis, build scatter plots, fit a linear model, and perform model selection on a dataset.

7 Pages1725 Words21 Views
   

Added on  2022-09-12

R Programming Analysis 2022

Perform exploratory data analysis, build scatter plots, fit a linear model, and perform model selection on a dataset.

   Added on 2022-09-12

ShareRelated Documents
R Programming
1
R Programming
(Name of Student)
(Institutional Affiliation)
(Date of Submission)
I. Exploratory Data Analysis (EDA)
R Programming Analysis 2022_1
R Programming
2
Directions:
1. Using the provided dataset, do the following:
a. Choose four (4) predictors you feel are the most important and produce
scatter plots of the response variable vs. these predictors.
The following predictors are chosen
b. For each plot, put the response variable on the y-axis and the predictor on the
x-axis.
c. Write up one (1) sentence to explain each scatter plot.
#Boxplot of Violent crime variables
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111)
Violent = crimedata_study[cols[7:12]]
xticklablesV =
['murdPerPop','rapesPerPop','robbbPerPop','assaultPerPop','ViolCrimesPe
rPop']
sns.boxplot(data=Violent)
ax.set(title="Violent crimes")
ax.set_xticklabels(xticklablesV)
plt.show()
#Boxplot of non-violent crime variables
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111)
nonViolent = crimedata_study[cols[12:17]]
xticklablesNV =
['burglPerPop','larcPerPop','autoTheftPerPop','arsonsPerPop','nonVio
lPerPop']
sns.boxplot(data=nonViolent)
ax.set(title="Non-violent crimes")
ax.set_xticklabels(xticklablesNV)
plt.show()
II. Fit a Linear Model
Directions:
1. Fit a linear model with y as the response and the four (4) variables chosen in Step 1
as the predictors.
1. Write up your interpretation of the output, e.g., does it match your intuition?
Deliverable(s):
1. Section containing the following:
a. the R code and output and
R Programming Analysis 2022_2
R Programming
3 ## Multiple Linear Regression of variables under study
X =
crimedata_reg[['HousVacant','PctHousOccup','PctHousOwnOcc','PctVacantBo
arded','PctVacMore6Mos','PctUnemployed','PctEmploy']]
y = crimedata_reg['ViolentCrimesPerPop']
## using statsmodel
##X = sm.add_constant(X)
##mregmodel = sm.OLS(y,X).fit()
##print (mregmodel.summary())
# create training and testing vars
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25,random_state=1)
# fit a model
lm = linear_model.LinearRegression()
mmodel = lm.fit(X_train, y_train)
predictions = lm.predict(X_test)
coefficients = mmodel.coef_
print("The coeffcients of our model is {}".format(coefficients))
intercept = mmodel.intercept_
print("The intercept for our model is {}".format(intercept))
print ("Linear model Train dataset score is
{}".format(mmodel.score(X_train,y_train)))
print ("Linear model Test dataset score is
{}".format(mmodel.score(X_test,y_test)))
b. the interpretation of the output.
The coeffcients of our model is [7.54745466e-03 -1.41786715e+01 -
1.10271206e+01 4.95497305e+01
-5.33677041e+00 1.98332262e+01 -4.35120857e+00]
The intercept for our model is 2768.815670578984
Linear model Train dataset score is 0.35931157284880066
Linear model Test dataset score is 0.3741230738398124
Cross Validatation Score is [0.36829157 0.2681317 0.26355751 0.45933171
0.31706041 0.47669981
0.34282423 0.32520228 0.47142926 0.28387437]
Cross Validatation Score mean is 0.3576402835870115
Cross Validatation standard deviation is 0.07924703838521946
III. Perform Model Selection
Directions:
1. Perform model selection via automated selection:
a. Apply fastbw() to the data in R.
b. Apply stepAIC() to the data in R.
2. For each procedure, submit your comment on the variables that the procedure
removed from or retained in your model. Think about the following questions to guide
your comments:
a. Does it match your intuition?
b. How do the automatically selected models compare to your model from Step
2?
c. Which model will you choose to proceed with?
R Programming Analysis 2022_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Predict Business Results Question Answer 2022
|8
|1054
|9