logo

Analysis of Forest Fires Dataset using Linear Regression

   

Added on  2023-04-23

7 Pages2070 Words115 Views
 | 
 | 
 | 
1) I)data=(read.table("C:\\Users\\Subhojit\\Desktop\\NERDY TUTLEZ\\898908\\Forest718.txt"))
colnames(data)=c("X1","X2","month","day","FFMC","DMC","DC","ISI","temp","RH","wind","r
ain","area")
II) sampledata = data[sample(1:517,200),c(1:13)]
III) Scatter Plot
The first plot shows the scatter plot between Temperature and Area. From the plot
we can interpret area is concentrated when the temperature lies in 18 – 25 degree C.
The second plot shows the scatter plot between Wind and Area. From the plot we
can interpret that all types of area has winds blowing in the range of 2 to 6 km/hr.
The third plot shows the scatter plot between RH and Area. From the plot we can
interpret area is concentrated when the Relative Humidity lies in 40 – 45 %.
In the plot between rain and area we can see that almost all area receive 0 rainfall.
There are two places where rain has happened non zero can see
Histogram
Maximum frequency in area happens in 0.010 – 0.015
Analysis of Forest Fires Dataset using Linear Regression_1

Histogram for Humidity. Maximum frequency happens in 40 - 50
Histogram for Rain. Maximum frequency happens in 0.
Histogram for temperature. Maximum frequency happens in 20 to 25 degree Celsius.
Histogram for wind. Maximum happens in 3 to 4 km/hr.
2) I) since we have taken the predictor variables as temp, relative humidity, wind and rain. Also
my response variable is area.
In the histogram of temperature we can see that it is left skewed that is the tail is in
the left so we are going to use square root of the data set to do the transformation.
In the histogram of wind we can see that it is right skewed that is the tail is in the
right side so we can use square of the dataset to do the transformation.
In the histogram of relative humidity we can see that it is right skewed that is the tail
is in the right side so we can use cube of the dataset to do the transformation.
Similarly for rain we are going to do square transformation as it is right skewed.
Area also no transformation is required as it almost looks like a normally distributed.
write.table (newdata,"name-transformed.txt1",sep="\t",row.names=FALSE)
II) If we follow the summary statistics of each of the predictors with the response variable we
can conclude few things.
For the first model (dependency of area on temp)
Analysis of Forest Fires Dataset using Linear Regression_2

Call:
lm(formula = area ~ trans_rain, data = newdata)
Residuals:
Min 1Q Median 3Q Max
-0.0065942 -0.0016419 -0.0001456 0.0010802 0.0074255
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0129869 0.0002050 63.337 <2e-16 ***
trans_rain 0.0007795 0.0014019 0.556 0.579
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002886 on 198 degrees of freedom
Multiple R-squared: 0.001559, Adjusted R-squared: -0.003483
F-statistic: 0.3092 on 1 and 198 DF, p-value: 0.5788
Here from the p value we can say coefficients are insignificant so rain cannot be considered as one of
the factors for the Area
For the second model(dependency of area on relative humidity)
Call:
lm(formula = area ~ trans_RH, data = newdata)
Residuals:
Min 1Q Median 3Q Max
-0.0062184 -0.0017496 0.0000532 0.0012283 0.0068708
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.363e-02 2.507e-04 54.370 < 2e-16 ***
trans_RH -4.955e-09 1.224e-09 -4.047 7.43e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002776 on 198 degrees of freedom
Multiple R-squared: 0.07641, Adjusted R-squared: 0.07174
F-statistic: 16.38 on 1 and 198 DF, p-value: 7.427e-05
Here from the p value we can say coefficients are significant so relative humidity can be considered
as one of the factors for the Area
For the third model (dependency of area on temp)
Call:
lm(formula = area ~ trans_temp, data = newdata)
Residuals:
Min 1Q Median 3Q Max
-0.0043020 -0.0012047 -0.0001768 0.0010320 0.0058853
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0002089 0.0007004 0.298 0.766
trans_temp 0.0029998 0.0001617 18.551 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.001746 on 198 degrees of freedom
Multiple R-squared: 0.6348, Adjusted R-squared: 0.6329
F-statistic: 344.1 on 1 and 198 DF, p-value: < 2.2e-16
Here from the p value we can say coefficients are significant so Temperature can be considered as
one of the factors for the Area
For the fourth model (dependency of area on wind)
Analysis of Forest Fires Dataset using Linear Regression_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents