Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support

Time Series Forecasting of Air-Quality Data

Verified

Added on 2022/10/19

AI Summary

This report deals with the prediction of the relative humidity, temperature and absolute humidity by taking time as the independent variable and the ones that are being predicted are taken as the dependent variable. The characteristics of the data set are multivariate and time-series.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Running head: TIME SERIES FORECASTING OF AIR-QUALITY DATA
Time Series Forecasting of Air-Quality Data
Name of the Student
Name of the University
Author Note

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

1TIME SERIES FORECASTING OF AIR-QUALITY DATA
Abstract:
Weather forecasting is important because it helps to determine the expectations of
future climate. The forecasting of weather is the prediction of weather through the techniques
of machine learning. In addition to the forecast of the atmospheric phenomenon, weather
forecasting of weather also involves guessing of changes in the outside of the earth that is
caused by the atmospheric conditions. The forecasting of weather is still done in the same
way as it was done by the early humans but many tools that are modern are used to compute
humidity, temperature, wind and the humidity. Even the numerically calculated forecast that
is sophisticated and is made on a computer needs a set of computation of the atmospheric
conditions- a picture of wind, temperature and the other elements that are basic. Since the
mid-20th century, the computers that are digital have made it feasible to compute the changes
in the atmospheric conditions objectively and mathematically in a way such that everyone
can get the outputs that are same from the initial conditions that are same. The acquisition of
the numerical models of prediction of weather brought the computer experts and specialists in
statistics and numerical processing to a scene in order to work with the scientists that work on
atmosphere and meteorology. The increased ability to analyse and process the data of the
weather that has triggered the interest of the meteorologists in securing more observations
that are of great accuracy. In this report the temperature, relative humidity and the absolute
humidity is predicted taking date as the independent variable and the ones that are being
predicted are dependent on date.

2TIME SERIES FORECASTING OF AIR-QUALITY DATA
Table of Contents
Introduction:...............................................................................................................................4
Dataset summary:.......................................................................................................................5
Data mining techniques:.............................................................................................................5
Results, evaluation and demonstration:......................................................................................7
Conclusions:.............................................................................................................................44
References:...............................................................................................................................45

3TIME SERIES FORECASTING OF AIR-QUALITY DATA
Introduction:
In this particular report time series forecasting is used as one of the most essential part
of machine learning that is sometimes forgotten. It is essential as there are problems related to
prediction that sometimes involves a component of time. These problems are often neglected
because it is the component that makes the problems of time series difficult to handle. Time
plays an essential role in the datasets of normal machine learning datasets. By the prediction
technique the unknown data which is the outside the range of selected sample is predicted. In
the predicted data the future outcomes are guessed, however, all the previous data have been
treated by equal expertise. The main dynamics of the model are overcome by the ‘concept
drift’ theory. Time series dataset is different from normal data as the dependence of the
values of variables in this method are represented by the time dimension. By the time
dimension an additional structure and information is obtained from the data. The report deals
with the prediction of the relative humidity, temperature and absolute humidity by taking
time as the independent variable and the ones that are being predicted are taken as the
dependent variable. The characteristics of the data set are multivariate and time-series.
The main objective of this experiment is to predict the weather data in future of February
2005 by time series furcating method. The humidity (relative or absolute) and temperature are
main properties of air and this these two are predicted for 10 hours in future of the given
sample on hourly basis. Additionally, the same fitting of data is performed by linear
regression where the same relative humidity and temperature is fitted by the predictors except
time and date (Junior et al. 2019). The two methods are compared and the best method is
selected based on the properties of fitting output like absolute error and the relative error.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

4TIME SERIES FORECASTING OF AIR-QUALITY DATA
Dataset summary:
The dataset contains 9358 data points of weather characteristics collected from a
series of consecutive 5 chemical sensors that are made up of chemical oxides that are
embedded in a chemical multisensory device of air quality. The chemical sensor device was
installed at a road level on some particular location in the Italian city. The data were collected
from the period of 2004 to 2005 from March to February which represents the available field
recordings which is installed by the air quality chemical sensor. The till time average
concentration of major pollutants of air like CO, non-metal hydrocarbons, NOx, C6H6 and
NO2 are obtained from a certified analyser which is co-located in some place of the city.
There is evidence of the cross sensitivities as well as the concepts and the drifts of sensor,
which slowly affects concentration of the sensors that estimates the capabilities. The values
that are missing are tagged with -200 values. This data set can be exclusively used for the
goal of research. The aim of commercial are excluded fully.
The characteristics of the data set are multivariate and time-series and the characteristics of
the attributes are real. The tasks that are associated with prediction of the relative/absolute
humidity and temperature by using linear regression and time series forecasting. The number
of data points that are present on the data set for the prediction of relative/absolute humidity,
temperature is 9358. The numbers of attributes that are present in the data set are 14.In the
given dataset the response data obtained from the multi-gas sensor which was employed on a
particular location of the Italian city. The responses are obtained in each hour along with the
data of gas concentration from the certified analyser.
The information of the attributes are as follows:
1. Date of collected data point in (DD/MM/YYYY) format.
2. The (reference analyser) concentration of CO in m/m^3

5TIME SERIES FORECASTING OF AIR-QUALITY DATA
3. PT08.S1: hourly average concentration of tin oxide gas
4. Reference analyzer output of Average concentration of Benzene micro-g/m^3
5. True hourly averaged overall non-metal hydrocarbon concentration in micro/m^3.
(reference analyzer)
6. PT08.S2 hourly averaged concentration of Titania (nominally NMHC targeted)
7. True hourly averaged nitrogen dioxide concentration in micro-g/m^3 (nominally targeted
nitrogen dioxide)
8. PT08. S3 hourly averaged concentration of tungsten oxide nominally NOx targeted.
9. PT08.S4: hourly averaged concentration of tungsten oxide
10. PT08.S5: hourly averaged sensor response of indium oxide.
11. Temperature in °C
12. Relative humidity (in percentage)
13. AH absolute humidity
Data mining techniques:
In the data mining techniques the two methods are used for forecasting the
temperature of air in degree Celsius and the relative humidity in percentage. In one case the
temperature is predicted in terms of other variables from
CO(GT),PT08S1(CO),NMHC(GT),C6H6(GT),PT08S2(NMHC),NOx(GT),PT08S3(NOx),N
O2(GT),PT08S4(NO2),PT08S5(O3) as predictors by using linear regression. In the other
method the temperature and humidity is predicted by time series furcating method where date
is taken as the only independent variable.
Now, after loading the csv file which contains the air quality data, the entire set is converted
to arff format for analysis and in the pre-processing the variable time is removed as it hour is
not needed for time series forecasting as the period of data is set for hour in Weka. Now, the

6TIME SERIES FORECASTING OF AIR-QUALITY DATA
data is analysed by using the linear regression base learner algorithm accurate to 4 decimal
points in Weka (González-Vidal, Jiménez and Gómez-Skarmeta 2019).
Additionally, some qualities of air like the hourly average concentration of CO in the air and
hourly average concentration of NOx are predicted for 10 hours in the future by time-series
forecasting as these two gas concentration critically effect the quality of air. It should be
noted before performing any time series forecasting the necessary package is installed in
Weka to interpret date is as numeric for time series analysis.
Results, evaluation and demonstration:
Linear regression prediction of temperature:
=== Run information ===
Scheme: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -additional-stats -
num-decimal-places 4
Relation: AirQualityUCI-weka.filters.unsupervised.attribute.Remove-
weka.filters.unsupervised.attribute.Remove-R1-weka.filters.unsupervised.attribute.Remove-
R13
Instances: 9357
Attributes: 12
CO(GT)
PT08S1(CO)
NMHC(GT)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

7TIME SERIES FORECASTING OF AIR-QUALITY DATA
C6H6(GT)
PT08S2(NMHC)
NOx(GT)
PT08S3(NOx)
NO2(GT)
PT08S4(NO2)
PT08S5(O3)
T
RH
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
Linear Regression Model
T =
0.0046 * CO(GT) +
0.0269 * PT08S1(CO) +

8TIME SERIES FORECASTING OF AIR-QUALITY DATA
-0.0376 * NMHC(GT) +
0.0603 * PT08S2(NMHC) +
-0.0457 * NOx(GT) +
0.0396 * PT08S3(NOx) +
0.0638 * NO2(GT) +
0.0093 * PT08S4(NO2) +
-0.0199 * PT08S5(O3) +
0.2793 * RH +
-110.406
Regression Analysis:
Variable Coefficient SE of Coef t-Stat
CO(GT) 0.0046 0.002 2.249
PT08S1(CO) 0.0269 0.0016 17.1176
NMHC(GT) -0.0376 0.0009 -40.2855
PT08S2(NMHC) 0.0603 0.0015 41.1556
NOx(GT) -0.0457 0.001 -44.3569
PT08S3(NOx) 0.0396 0.0007 59.4752
NO2(GT) 0.0638 0.0019 33.7988

9TIME SERIES FORECASTING OF AIR-QUALITY DATA
PT08S4(NO2) 0.0093 0.0006 14.4045
PT08S5(O3) -0.0199 0.0008 -24.09
RH 0.2793 0.0059 47.3266
const -110.406 1.0529 -104.8572
Degrees of freedom = 9346
R^2 value = 0.9338
Adjusted R^2 = 0.93368
F-statistic = 13173.648
Time taken to build model: 0.09 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.9662
Mean absolute error 8.54
Root mean squared error 11.139
Relative absolute error 48.3804 %
Root relative squared error 25.7812 %

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

10TIME SERIES FORECASTING OF AIR-QUALITY DATA
Total Number of Instances 9357
Linear regression prediction of Relative humidity:
=== Run information ===
Scheme: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -additional-stats -
num-decimal-places 4
Relation: AirQualityUCI-weka.filters.unsupervised.attribute.Remove-
weka.filters.unsupervised.attribute.Remove-R1-weka.filters.unsupervised.attribute.Remove-
R13
Instances: 9357
Attributes: 12
CO(GT)
PT08S1(CO)
NMHC(GT)
C6H6(GT)
PT08S2(NMHC)
NOx(GT)
PT08S3(NOx)
NO2(GT)
PT08S4(NO2)
PT08S5(O3)

11TIME SERIES FORECASTING OF AIR-QUALITY DATA
T
RH
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
Linear Regression Model
RH =
0.0098 * CO(GT) +
0.1022 * PT08S1(CO) +
-0.0194 * NMHC(GT) +
-0.1039 * PT08S2(NMHC) +
0.0358 * NOx(GT) +
0.0331 * PT08S3(NOx) +
-0.0697 * NO2(GT) +
0.0046 * PT08S4(NO2) +
0.0284 * PT08S5(O3) +
0.6921 * T +
-46.6142
Regression Analysis:
Variable Coefficient SE of Coef t-Stat

12TIME SERIES FORECASTING OF AIR-QUALITY DATA
CO(GT) 0.0098 0.0032 3.0617
PT08S1(CO) 0.1022 0.0023 44.8339
NMHC(GT) -0.0194 0.0016 -12.3073
PT08S2(NMHC) -0.1039 0.0023 -45.8413
NOx(GT) 0.0358 0.0017 20.5461
PT08S3(NOx) 0.0331 0.0012 27.9508
NO2(GT) -0.0697 0.0031 -22.7652
PT08S4(NO2) 0.0046 0.001 4.4514
PT08S5(O3) 0.0284 0.0013 21.7672
T 0.6921 0.0146 47.3266
const -46.6142 2.397 -19.4471
Degrees of freedom = 9346
R^2 value = 0.8832
Adjusted R^2 = 0.88309
F-statistic = 7067.9048
Time taken to build model: 0.04 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.9396
Mean absolute error 14.1209

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

13TIME SERIES FORECASTING OF AIR-QUALITY DATA
Root mean squared error 17.5274
Relative absolute error 56.1039 %
Root relative squared error 34.2189 %
Total Number of Instances 9357
Time series model output for Temperature prediction:
=== Run information ===
Scheme:
LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
Lagged and derived variable options:
-F T -L 1 -M 24 -am-pm
Relation: AirQualityUCI
Instances: 9357
Attributes: 14
Date
CO(GT)
PT08S1(CO)
NMHC(GT)
C6H6(GT)
PT08S2(NMHC)
NOx(GT)

14TIME SERIES FORECASTING OF AIR-QUALITY DATA
PT08S3(NOx)
NO2(GT)
PT08S4(NO2)
PT08S5(O3)
T
RH
AH
Transformed training data:
T
ArtificialTimeIndex
Lag_T-1
Lag_T-2
Lag_T-3
Lag_T-4
Lag_T-5
Lag_T-6
Lag_T-7
Lag_T-8
Lag_T-9
Lag_T-10

15TIME SERIES FORECASTING OF AIR-QUALITY DATA
Lag_T-11
Lag_T-12
Lag_T-13
Lag_T-14
Lag_T-15
Lag_T-16
Lag_T-17
Lag_T-18
Lag_T-19
Lag_T-20
Lag_T-21
Lag_T-22
Lag_T-23
Lag_T-24
ArtificialTimeIndex^2
ArtificialTimeIndex^3
ArtificialTimeIndex*Lag_T-1
ArtificialTimeIndex*Lag_T-2
ArtificialTimeIndex*Lag_T-3
ArtificialTimeIndex*Lag_T-4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

16TIME SERIES FORECASTING OF AIR-QUALITY DATA
ArtificialTimeIndex*Lag_T-5
ArtificialTimeIndex*Lag_T-6
ArtificialTimeIndex*Lag_T-7
ArtificialTimeIndex*Lag_T-8
ArtificialTimeIndex*Lag_T-9
ArtificialTimeIndex*Lag_T-10
ArtificialTimeIndex*Lag_T-11
ArtificialTimeIndex*Lag_T-12
ArtificialTimeIndex*Lag_T-13
ArtificialTimeIndex*Lag_T-14
ArtificialTimeIndex*Lag_T-15
ArtificialTimeIndex*Lag_T-16
ArtificialTimeIndex*Lag_T-17
ArtificialTimeIndex*Lag_T-18
ArtificialTimeIndex*Lag_T-19
ArtificialTimeIndex*Lag_T-20
ArtificialTimeIndex*Lag_T-21
ArtificialTimeIndex*Lag_T-22
ArtificialTimeIndex*Lag_T-23
ArtificialTimeIndex*Lag_T-24

17TIME SERIES FORECASTING OF AIR-QUALITY DATA
T:
Linear Regression Model
T =
0.0009 * ArtificialTimeIndex +
0.911 * Lag_T-1 +
0.049 * Lag_T-2 +
-0.1206 * Lag_T-3 +
0.0464 * Lag_T-4 +
0.0794 * Lag_T-5 +
-0.0188 * Lag_T-6 +
-0.0258 * Lag_T-8 +
0.0924 * Lag_T-9 +
-0.089 * Lag_T-10 +
0.0362 * Lag_T-11 +
-0.0166 * Lag_T-12 +
-0.0993 * Lag_T-14 +

18TIME SERIES FORECASTING OF AIR-QUALITY DATA
0.0805 * Lag_T-15 +
-0.0252 * Lag_T-19 +
0.0327 * Lag_T-20 +
-0.0145 * Lag_T-24 +
-0 * ArtificialTimeIndex^2 +
0 * ArtificialTimeIndex^3 +
-0 * ArtificialTimeIndex*Lag_T-1 +
0 * ArtificialTimeIndex*Lag_T-2 +
0 * ArtificialTimeIndex*Lag_T-3 +
-0 * ArtificialTimeIndex*Lag_T-4 +
-0 * ArtificialTimeIndex*Lag_T-5 +
0 * ArtificialTimeIndex*Lag_T-6 +
-0 * ArtificialTimeIndex*Lag_T-9 +
0 * ArtificialTimeIndex*Lag_T-10 +
0 * ArtificialTimeIndex*Lag_T-11 +
0 * ArtificialTimeIndex*Lag_T-14 +
-0 * ArtificialTimeIndex*Lag_T-15 +
0.3985
=== Evaluation on training data ===

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

19TIME SERIES FORECASTING OF AIR-QUALITY DATA
Target 1-step-ahead 2-steps-ahead 3-steps-ahead 4-steps-ahead 5-steps-ahead
6-steps-ahead 7-steps-ahead 8-steps-ahead 9-steps-ahead 10-steps-ahead
==================================================================
==================================================================
===========================================
T
N 9333 9332 9331 9330 9329 9328
9327 9326 9325 9324
Mean absolute error 2.2816 3.7735 5.1441 6.319 7.3198
8.1748 8.9465 9.6296 10.1783 10.6632
Root mean squared error 12.602 16.5916 19.69 22.077 23.9484
25.4799 26.8601 28.1021 29.1235 30.0639
Total number of instances: 9357
Graph of Actual temperature and predicted values:

20TIME SERIES FORECASTING OF AIR-QUALITY DATA
Future predicted values at instants:
9358* 28.045
9359* 27.4022
9360* 26.403
9361* 25.4996
9362* 24.8303
9363* 24.4292
9364* 24.198
9365* 24.1812
9366* 24.3394
9367* 24.5905
Graph of future forecasted values with predicted values:
Time series model output for Relative humidity prediction:

21TIME SERIES FORECASTING OF AIR-QUALITY DATA
=== Run information ===
Scheme:
LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
Lagged and derived variable options:
-F RH -L 1 -M 24 -am-pm
Relation: AirQualityUCI
Instances: 9357
Attributes: 14
Date
CO(GT)
PT08S1(CO)
NMHC(GT)
C6H6(GT)
PT08S2(NMHC)
NOx(GT)
PT08S3(NOx)
NO2(GT)

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

22TIME SERIES FORECASTING OF AIR-QUALITY DATA
PT08S4(NO2)
PT08S5(O3)
T
RH
AH
Transformed training data:
RH
ArtificialTimeIndex
Lag_RH-1
Lag_RH-2
Lag_RH-3
Lag_RH-4
Lag_RH-5
Lag_RH-6
Lag_RH-7
Lag_RH-8
Lag_RH-9
Lag_RH-10

23TIME SERIES FORECASTING OF AIR-QUALITY DATA
Lag_RH-11
Lag_RH-12
Lag_RH-13
Lag_RH-14
Lag_RH-15
Lag_RH-16
Lag_RH-17
Lag_RH-18
Lag_RH-19
Lag_RH-20
Lag_RH-21
Lag_RH-22
Lag_RH-23
Lag_RH-24
ArtificialTimeIndex^2
ArtificialTimeIndex^3
ArtificialTimeIndex*Lag_RH-1
ArtificialTimeIndex*Lag_RH-2
ArtificialTimeIndex*Lag_RH-3
ArtificialTimeIndex*Lag_RH-4

24TIME SERIES FORECASTING OF AIR-QUALITY DATA
ArtificialTimeIndex*Lag_RH-5
ArtificialTimeIndex*Lag_RH-6
ArtificialTimeIndex*Lag_RH-7
ArtificialTimeIndex*Lag_RH-8
ArtificialTimeIndex*Lag_RH-9
ArtificialTimeIndex*Lag_RH-10
ArtificialTimeIndex*Lag_RH-11
ArtificialTimeIndex*Lag_RH-12
ArtificialTimeIndex*Lag_RH-13
ArtificialTimeIndex*Lag_RH-14
ArtificialTimeIndex*Lag_RH-15
ArtificialTimeIndex*Lag_RH-16
ArtificialTimeIndex*Lag_RH-17
ArtificialTimeIndex*Lag_RH-18
ArtificialTimeIndex*Lag_RH-19
ArtificialTimeIndex*Lag_RH-20
ArtificialTimeIndex*Lag_RH-21
ArtificialTimeIndex*Lag_RH-22
ArtificialTimeIndex*Lag_RH-23
ArtificialTimeIndex*Lag_RH-24

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

25TIME SERIES FORECASTING OF AIR-QUALITY DATA
RH:
Linear Regression Model
RH =
-0.0012 * ArtificialTimeIndex +
0.938 * Lag_RH-1 +
0.0505 * Lag_RH-2 +
-0.1148 * Lag_RH-3 +
0.0358 * Lag_RH-4 +
0.0666 * Lag_RH-5 +
-0.0253 * Lag_RH-6 +
-0.022 * Lag_RH-7 +
-0.0291 * Lag_RH-8 +
0.0962 * Lag_RH-9 +
-0.1023 * Lag_RH-10 +
0.0328 * Lag_RH-11 +
-0.079 * Lag_RH-14 +

26TIME SERIES FORECASTING OF AIR-QUALITY DATA
0.079 * Lag_RH-15 +
0.028 * Lag_RH-20 +
0.0177 * Lag_RH-23 +
-0.0484 * Lag_RH-24 +
0 * ArtificialTimeIndex^2 +
-0 * ArtificialTimeIndex^3 +
-0 * ArtificialTimeIndex*Lag_RH-1 +
0 * ArtificialTimeIndex*Lag_RH-2 +
0 * ArtificialTimeIndex*Lag_RH-3 +
-0 * ArtificialTimeIndex*Lag_RH-4 +
-0 * ArtificialTimeIndex*Lag_RH-5 +
0 * ArtificialTimeIndex*Lag_RH-6 +
0 * ArtificialTimeIndex*Lag_RH-7 +
-0 * ArtificialTimeIndex*Lag_RH-9 +
0 * ArtificialTimeIndex*Lag_RH-10 +
0 * ArtificialTimeIndex*Lag_RH-11 +
-0 * ArtificialTimeIndex*Lag_RH-12 +
0 * ArtificialTimeIndex*Lag_RH-14 +
-0 * ArtificialTimeIndex*Lag_RH-15 +
0 * ArtificialTimeIndex*Lag_RH-24 +

27TIME SERIES FORECASTING OF AIR-QUALITY DATA
3.8503
Future predictions at instances:
9358* 12.4699
9359* 12.9548
9360* 15.248
9361* 17.4465
9362* 18.6223
9363* 19.6845
9364* 20.6846
9365* 21.4594
9366* 21.8813
9367* 22.3257
=== Evaluation on training data ===
Target 1-step-ahead 2-steps-ahead 3-steps-ahead 4-steps-ahead 5-steps-ahead
6-steps-ahead 7-steps-ahead 8-steps-ahead 9-steps-ahead 10-steps-ahead
==================================================================
==================================================================
===========================================
RH
N 9333 9332 9331 9330 9329 9328
9327 9326 9325 9324

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

28TIME SERIES FORECASTING OF AIR-QUALITY DATA
Mean absolute error 4.4524 7.3673 9.938 12.1467 14.032
15.6022 16.9259 18.012 18.8329 19.4241
Root mean squared error 14.6943 19.8158 23.8376 27.0123 29.5491
31.5867 33.408 35.0064 36.3126 37.417
Total number of instances: 9357
Relative humidity actual and predicted data plot:
Future forecast of relative humidity plot with current prediction:
CO(GT) prediction using time series:

29TIME SERIES FORECASTING OF AIR-QUALITY DATA
=== Run information ===
Scheme:
LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
Lagged and derived variable options:
-F CO(GT) -L 1 -M 24 -am-pm
Relation: AirQualityUCI
Instances: 9357
Attributes: 14
Date
CO(GT)
PT08S1(CO)
NMHC(GT)
C6H6(GT)
PT08S2(NMHC)
NOx(GT)
PT08S3(NOx)
NO2(GT)

30TIME SERIES FORECASTING OF AIR-QUALITY DATA
PT08S4(NO2)
PT08S5(O3)
T
RH
AH
Transformed training data:
CO(GT)
ArtificialTimeIndex
Lag_CO(GT)-1
Lag_CO(GT)-2
Lag_CO(GT)-3
Lag_CO(GT)-4
Lag_CO(GT)-5
Lag_CO(GT)-6
Lag_CO(GT)-7
Lag_CO(GT)-8
Lag_CO(GT)-9
Lag_CO(GT)-10

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

31TIME SERIES FORECASTING OF AIR-QUALITY DATA
Lag_CO(GT)-11
Lag_CO(GT)-12
Lag_CO(GT)-13
Lag_CO(GT)-14
Lag_CO(GT)-15
Lag_CO(GT)-16
Lag_CO(GT)-17
Lag_CO(GT)-18
Lag_CO(GT)-19
Lag_CO(GT)-20
Lag_CO(GT)-21
Lag_CO(GT)-22
Lag_CO(GT)-23
Lag_CO(GT)-24
ArtificialTimeIndex^2
ArtificialTimeIndex^3
ArtificialTimeIndex*Lag_CO(GT)-1
ArtificialTimeIndex*Lag_CO(GT)-2
ArtificialTimeIndex*Lag_CO(GT)-3
ArtificialTimeIndex*Lag_CO(GT)-4

32TIME SERIES FORECASTING OF AIR-QUALITY DATA
ArtificialTimeIndex*Lag_CO(GT)-5
ArtificialTimeIndex*Lag_CO(GT)-6
ArtificialTimeIndex*Lag_CO(GT)-7
ArtificialTimeIndex*Lag_CO(GT)-8
ArtificialTimeIndex*Lag_CO(GT)-9
ArtificialTimeIndex*Lag_CO(GT)-10
ArtificialTimeIndex*Lag_CO(GT)-11
ArtificialTimeIndex*Lag_CO(GT)-12
ArtificialTimeIndex*Lag_CO(GT)-13
ArtificialTimeIndex*Lag_CO(GT)-14
ArtificialTimeIndex*Lag_CO(GT)-15
ArtificialTimeIndex*Lag_CO(GT)-16
ArtificialTimeIndex*Lag_CO(GT)-17
ArtificialTimeIndex*Lag_CO(GT)-18
ArtificialTimeIndex*Lag_CO(GT)-19
ArtificialTimeIndex*Lag_CO(GT)-20
ArtificialTimeIndex*Lag_CO(GT)-21
ArtificialTimeIndex*Lag_CO(GT)-22
ArtificialTimeIndex*Lag_CO(GT)-23
ArtificialTimeIndex*Lag_CO(GT)-24

33TIME SERIES FORECASTING OF AIR-QUALITY DATA
CO(GT):
Linear Regression Model
CO(GT) =
-0.0012 * ArtificialTimeIndex +
0.4456 * Lag_CO(GT)-1 +
0.2342 * Lag_CO(GT)-2 +
0.1124 * Lag_CO(GT)-3 +
0.0923 * Lag_CO(GT)-4 +
0.0416 * Lag_CO(GT)-5 +
0.026 * Lag_CO(GT)-6 +
0.0162 * Lag_CO(GT)-9 +
-0.0364 * Lag_CO(GT)-18 +
-0.0266 * Lag_CO(GT)-19 +
-0.0402 * Lag_CO(GT)-20 +
-0.0718 * Lag_CO(GT)-21 +
-0.0786 * Lag_CO(GT)-22 +
-0.1108 * Lag_CO(GT)-23 +
0.3717 * Lag_CO(GT)-24 +
0 * ArtificialTimeIndex^2 +

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

34TIME SERIES FORECASTING OF AIR-QUALITY DATA
-0 * ArtificialTimeIndex^3 +
0 * ArtificialTimeIndex*Lag_CO(GT)-1 +
-0 * ArtificialTimeIndex*Lag_CO(GT)-2 +
-0 * ArtificialTimeIndex*Lag_CO(GT)-4 +
0 * ArtificialTimeIndex*Lag_CO(GT)-18 +
0 * ArtificialTimeIndex*Lag_CO(GT)-20 +
0 * ArtificialTimeIndex*Lag_CO(GT)-21 +
0 * ArtificialTimeIndex*Lag_CO(GT)-22 +
0 * ArtificialTimeIndex*Lag_CO(GT)-23 +
-0 * ArtificialTimeIndex*Lag_CO(GT)-24 +
0.0141
Actual vs predicted CO(GT):
Future forecast along with current CO(GT):

35TIME SERIES FORECASTING OF AIR-QUALITY DATA
Nox(GT) prediction using time series:
=== Run information ===
Scheme:
LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
Lagged and derived variable options:
-F NOx(GT) -L 1 -M 24 -am-pm
Relation: AirQualityUCI
Instances: 9357
Attributes: 14
Date
CO(GT)
PT08S1(CO)

36TIME SERIES FORECASTING OF AIR-QUALITY DATA
NMHC(GT)
C6H6(GT)
PT08S2(NMHC)
NOx(GT)
PT08S3(NOx)
NO2(GT)
PT08S4(NO2)
PT08S5(O3)
T
RH
AH
Transformed training data:
NOx(GT)
ArtificialTimeIndex
Lag_NOx(GT)-1
Lag_NOx(GT)-2
Lag_NOx(GT)-3
Lag_NOx(GT)-4

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

37TIME SERIES FORECASTING OF AIR-QUALITY DATA
Lag_NOx(GT)-5
Lag_NOx(GT)-6
Lag_NOx(GT)-7
Lag_NOx(GT)-8
Lag_NOx(GT)-9
Lag_NOx(GT)-10
Lag_NOx(GT)-11
Lag_NOx(GT)-12
Lag_NOx(GT)-13
Lag_NOx(GT)-14
Lag_NOx(GT)-15
Lag_NOx(GT)-16
Lag_NOx(GT)-17
Lag_NOx(GT)-18
Lag_NOx(GT)-19
Lag_NOx(GT)-20
Lag_NOx(GT)-21
Lag_NOx(GT)-22
Lag_NOx(GT)-23
Lag_NOx(GT)-24

38TIME SERIES FORECASTING OF AIR-QUALITY DATA
ArtificialTimeIndex^2
ArtificialTimeIndex^3
ArtificialTimeIndex*Lag_NOx(GT)-1
ArtificialTimeIndex*Lag_NOx(GT)-2
ArtificialTimeIndex*Lag_NOx(GT)-3
ArtificialTimeIndex*Lag_NOx(GT)-4
ArtificialTimeIndex*Lag_NOx(GT)-5
ArtificialTimeIndex*Lag_NOx(GT)-6
ArtificialTimeIndex*Lag_NOx(GT)-7
ArtificialTimeIndex*Lag_NOx(GT)-8
ArtificialTimeIndex*Lag_NOx(GT)-9
ArtificialTimeIndex*Lag_NOx(GT)-10
ArtificialTimeIndex*Lag_NOx(GT)-11
ArtificialTimeIndex*Lag_NOx(GT)-12
ArtificialTimeIndex*Lag_NOx(GT)-13
ArtificialTimeIndex*Lag_NOx(GT)-14
ArtificialTimeIndex*Lag_NOx(GT)-15
ArtificialTimeIndex*Lag_NOx(GT)-16
ArtificialTimeIndex*Lag_NOx(GT)-17
ArtificialTimeIndex*Lag_NOx(GT)-18

39TIME SERIES FORECASTING OF AIR-QUALITY DATA
ArtificialTimeIndex*Lag_NOx(GT)-19
ArtificialTimeIndex*Lag_NOx(GT)-20
ArtificialTimeIndex*Lag_NOx(GT)-21
ArtificialTimeIndex*Lag_NOx(GT)-22
ArtificialTimeIndex*Lag_NOx(GT)-23
ArtificialTimeIndex*Lag_NOx(GT)-2
NOx(GT):
Linear Regression Model
NOx(GT) =
-0.0054 * ArtificialTimeIndex +
0.6238 * Lag_NOx(GT)-1 +
0.2464 * Lag_NOx(GT)-2 +
-0.1262 * Lag_NOx(GT)-4 +
0.0681 * Lag_NOx(GT)-5 +
0.0386 * Lag_NOx(GT)-8 +
0.072 * Lag_NOx(GT)-11 +
-0.0585 * Lag_NOx(GT)-14 +
-0.0291 * Lag_NOx(GT)-17 +
-0.0845 * Lag_NOx(GT)-20 +
0.1415 * Lag_NOx(GT)-21 +

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

40TIME SERIES FORECASTING OF AIR-QUALITY DATA
-0.1676 * Lag_NOx(GT)-23 +
0.25 * Lag_NOx(GT)-24 +
0 * ArtificialTimeIndex^2 +
-0 * ArtificialTimeIndex^3 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-1 +
-0 * ArtificialTimeIndex*Lag_NOx(GT)-2 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-4 +
-0 * ArtificialTimeIndex*Lag_NOx(GT)-5 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-7 +
-0 * ArtificialTimeIndex*Lag_NOx(GT)-8 +
-0 * ArtificialTimeIndex*Lag_NOx(GT)-11 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-12 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-14 +
-0 * ArtificialTimeIndex*Lag_NOx(GT)-18 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-20 +
-0 * ArtificialTimeIndex*Lag_NOx(GT)-21 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-22 +
0 * ArtificialTimeIndex*Lag_NOx(GT)-23 +
-0 * ArtificialTimeIndex*Lag_NOx(GT)-24 +
4.6463

41TIME SERIES FORECASTING OF AIR-QUALITY DATA
Future predicted data:
9358* 254.0032
9359* 247.2432
9360* 252.6842
9361* 255.9062
9362* 282.9543
9363* 297.4384
9364* 280.8169
9365* 265.5783
9366* 232.8939
9367* 177.9536
Nox(GT) predicted vs actual:
Future Nox(GT) along with current data:

42TIME SERIES FORECASTING OF AIR-QUALITY DATA
The linear regression results show that the adjusted R^2 value for temperature model is
0.93368 or 93.368% is explained by its predictors and in the relative humidity model the over
88% of variation is explained by the independent variable. Hence, the linear regression is a
good fit is to both temperature and relative humidity. However, from the time series
prediction it can be seen that the mean absolute errors just after the given timeframe of data is
less and the error increases as the steps increases. Also, the actual and the predicted
temperature, humidity, CO concentration and Nox concentration are very close in the above
plots. Thus time series model is also a good fit for the data (Gayathridevi, Karthika and
Marikkannan 2018). Additionally, the benefit of time series is that only the date variable is
needed for the prediction related to air quality variables and thus this gives an advantage to
predict the data with fewer variable with good accuracy or fewer computational time and
complexity (Usha and Balamurugan 2016). Furthermore, the confidence intervals of the
predictions at each hours or steps in future with probable errors are given in predictions and
thus this gives user to extract the future air quality data based on desired accuracy limit.
Hence, the time series forecasting for the given air quality data can be considered as the
better model for prediction.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

43TIME SERIES FORECASTING OF AIR-QUALITY DATA
Conclusions:
In conclusion it can be stated that the main objectives of this particular assignment has
been successfully completed as both the primary properties of air quality data, temperature
and humidity collected from the UCI machine learning repository are successfully predicted
for near future hours. It is found from the data mining that the time series forecasting model
is better model for prediction of air quality data than the linear regression method due to its
accuracy with just one variable which is date frame of data in successive hours. The time
series model can be used for predicted future temperature, humidity and other two gas
concentration in the air just after the given sample time frame with good confidence level.
However, the model is not valid for same timeframe in different cities or countries as the
sample data is collected from a region of Italian city by using the Air Quality Chemical
Multisensory Device.

44TIME SERIES FORECASTING OF AIR-QUALITY DATA
References:
Archive.ics.uci.edu. (2019). UCI Machine Learning Repository: Air Quality Data Set.
[online] Available at: https://archive.ics.uci.edu/ml/datasets/Air+Quality# [Accessed 19 Sep.
2019].
Austin, P.C. and Steyerberg, E.W., 2015. The number of subjects per variable required in
linear regression analyses. Journal of clinical epidemiology, 68(6), pp.627-636.
Gayathridevi, M., Karthika, K. and Marikkannan, M., 2018. Time series analysis of Weather
Data Using Weka. Journal of Network Security and Data Mining, 1(3), pp.1-6.
González-Vidal, A., Jiménez, F. and Gómez-Skarmeta, A.F., 2019. A methodology for
energy multivariate time series forecasting in smart buildings based on feature
selection. Energy and Buildings, 196, pp.71-82.
Harrell Jr, F.E., 2015. Regression modeling strategies: with applications to linear models,
logistic and ordinal regression, and survival analysis. Springer.
Junior, S.L.D., Cecatto, J.R., Fernandes, M.M. and Ribeiro, M.X., 2019. Handling
Imbalanced Time Series Through Ensemble of Classifiers: A Multi-class Approach for Solar
Flare Forecasting. In 16th International Conference on Information Technology-New
Generations (ITNG 2019) (pp. 209-214). Springer, Cham.
Tsoukalas, V.D. and Fragiadakis, N.G., 2016. Prediction of occupational risk in the
shipbuilding industry using multivariable linear regression and genetic algorithm analysis.
Safety Science, 83, pp.12-22.
Usha, T.M. and Balamurugan, S.A.A., 2016. Seasonal based electricity demand forecasting
using time series analysis. Circuits Syst, 7, pp.3320-3328.