logo

Analysis of Forest Fires Dataset using Multiple Regression

   

Added on  2023-04-22

7 Pages1491 Words365 Views
 | 
 | 
 | 
1) I)data=(read.table("C:\\Users\\Subhojit\\Desktop\\NERDY TUTLEZ\\898908\\Forest718.txt"))
colnames(data)=c("X1","X2","month","day","FFMC","DMC","DC","ISI","temp","RH","wind","r
ain","area")
II) sampledata = data[sample(1:517,200),c(1:13)]
III) Scatter Plot
We are going to consider the scatter plot between FFMC and Area. We can see that
most of the Area is concentrated when the FFMC index from the FWI system lies
around 90 -95.
We are going to consider the scatter plot between DMC and Area. We can see that
the most of the area is concentrated when the DMC index lies in 75 – 150.
We are going to consider the scatter plot between area and temp. Area is
concentrated when temperature lies in 17 to 25 degree Celsius.
We are going to consider the scatter plot between area and wind. We can see that all
types of area has winds blowing in the range of 2 to 6 km/hr.
Histogram
Analysis of Forest Fires Dataset using Multiple Regression_1

Maximum frequency in area happens in 0.010 –
0.015
Histogram for FFMC. Maximum frequency happens in 90 -100
Histogram for DMC. Maximum frequency happens in 100 -150.
Histogram for temperature. Maximum frequency happens in 20 to 25 degree Celsius.
Histogram for wind. Maximum happens in 2 to 4 km/hr.
2) I) since we have taken the predictor variables as FFMC, DMC, temp and wind. Also my
response variable is area.
If we check the histogram of all the variables and we can see that FFMC is left
skewed that is tail is in the left so we are going to use cube root for the data
transformation.
Analysis of Forest Fires Dataset using Multiple Regression_2

Similarly if we check the DMC it is right skewed so we can use cube of the dataset to
do the transformation.
Temperature we are going to do square root transformation is required.
Similarly for wind we are going to do square transformation.
Area also no transformation is required as it almost looks like a normally distributed.
write.table (newdata,"name-transformed.txt",sep="\t",row.names=FALSE)
II) If we follow the summary statistics of each of the predictors with the response variable we
can conclude few things.
For the first model (dependency of area on FFMC)
Here from the p value we can say coefficients are significant so FFMC can be considered as one of the
factors for the Area
For the second model(dependency of area on DMC)
Here from the p value we can say coefficients are significant so DMC can be considered as one of the
factors for the Area
For the third model (dependency of area on Temp)
Analysis of Forest Fires Dataset using Multiple Regression_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents