This report deals with the prediction of the relative humidity, temperature and absolute humidity by taking time as the independent variable and the ones that are being predicted are taken as the dependent variable. The characteristics of the data set are multivariate and time-series.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Running head: TIME SERIES FORECASTING OF AIR-QUALITY DATA Time Series Forecasting of Air-Quality Data Name of the Student Name of the University Author Note
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1TIME SERIES FORECASTING OF AIR-QUALITY DATA Abstract: Weather forecasting is important because it helps to determine the expectations of future climate. The forecasting of weather is the prediction of weather through the techniques of machine learning. In addition to the forecast of the atmospheric phenomenon, weather forecasting of weather also involves guessing of changes in the outside of the earth that is caused by the atmospheric conditions. The forecasting of weather is still done in the same way as it was done by the early humans but many tools that are modern are used to compute humidity, temperature, wind and the humidity. Even the numerically calculated forecast that is sophisticated and is made on a computer needs a set of computation of the atmospheric conditions- a picture of wind, temperature and the other elements that are basic. Since the mid-20thcentury, the computers that are digital have made it feasible to compute the changes in the atmospheric conditions objectively and mathematically in a way such that everyone can get the outputs that are same from the initial conditions that are same. The acquisition of the numerical models of prediction of weather brought the computer experts and specialists in statistics and numerical processing to a scene in order to work with the scientists that work on atmosphere and meteorology. The increased ability to analyse and process the data of the weather that has triggered the interest of the meteorologists in securing more observations that are of great accuracy. In this report the temperature, relative humidity and the absolute humidity is predicted taking date as the independent variable and the ones that are being predicted are dependent on date.
2TIME SERIES FORECASTING OF AIR-QUALITY DATA Table of Contents Introduction:...............................................................................................................................4 Dataset summary:.......................................................................................................................5 Data mining techniques:.............................................................................................................5 Results, evaluation and demonstration:......................................................................................7 Conclusions:.............................................................................................................................44 References:...............................................................................................................................45
3TIME SERIES FORECASTING OF AIR-QUALITY DATA Introduction: In this particular report time series forecasting is used as one of the most essential part of machine learning that is sometimes forgotten. It is essential as there are problems related to prediction that sometimes involves a component of time. These problems are often neglected because it is the component that makes the problems of time series difficult to handle. Time plays an essential role in the datasets of normal machine learning datasets. By the prediction technique the unknown data which is the outside the range of selected sample is predicted. In the predicted data the future outcomes are guessed, however, all the previous data have been treated by equal expertise. The main dynamics of the model are overcome by the ‘concept drift’ theory. Time series dataset is different from normal data as the dependence of the values of variables in this method are represented by the time dimension. By the time dimension an additional structure and information is obtained from the data. The report deals with the prediction of the relative humidity, temperature and absolute humidity by taking time as the independent variable and the ones that are being predicted are taken as the dependent variable. The characteristics of the data set are multivariate and time-series. The main objective of this experiment is to predict the weather data in future of February 2005 by time series furcating method. The humidity (relative or absolute) and temperature are main properties of air and this these two are predicted for 10 hours in future of the given sample on hourly basis. Additionally, the same fitting of data is performed by linear regression where the same relative humidity and temperature is fitted by the predictors except time and date (Junior et al. 2019). The two methods are compared and the best method is selected based on the properties of fitting output like absolute error and the relative error.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
4TIME SERIES FORECASTING OF AIR-QUALITY DATA Dataset summary: The dataset contains 9358 data points of weather characteristics collected from a series of consecutive 5 chemical sensors that are made up of chemical oxides that are embedded in a chemical multisensory device of air quality. The chemical sensor device was installed at a road level on some particular location in the Italian city. The data were collected from the period of 2004 to 2005 from March to February which represents the available field recordings which is installed by the air quality chemical sensor. The till time average concentration of major pollutants of air like CO, non-metal hydrocarbons, NOx, C6H6 and NO2are obtained from a certified analyser which is co-located in some place of the city. There is evidence of the cross sensitivities as well as the concepts and the drifts of sensor, which slowly affects concentration of the sensors that estimates the capabilities. The values that are missing are tagged with -200 values. This data set can be exclusively used for the goal of research. The aim of commercial are excluded fully. The characteristics of the data set are multivariate and time-series and the characteristics of the attributes are real. The tasks that are associated with prediction of the relative/absolute humidity and temperature by using linear regression and time series forecasting. The number of data points that are present on the data set for the prediction of relative/absolute humidity, temperature is 9358. The numbers of attributes that are present in the data set are 14.In the given dataset the response data obtained from the multi-gas sensor which was employed on a particular location of the Italian city. The responses are obtained in each hour along with the data of gas concentration from the certified analyser. The information of the attributes are as follows: 1.Date of collected data point in (DD/MM/YYYY) format. 2.The (reference analyser) concentration of CO in m/m^3
5TIME SERIES FORECASTING OF AIR-QUALITY DATA 3.PT08.S1: hourly average concentration of tin oxide gas 4.Reference analyzer output of Average concentration of Benzene micro-g/m^3 5.Truehourlyaveragedoverallnon-metalhydrocarbonconcentrationinmicro/m^3. (reference analyzer) 6.PT08.S2hourly averaged concentration of Titania (nominally NMHC targeted) 7.True hourly averaged nitrogen dioxide concentration in micro-g/m^3 (nominally targeted nitrogen dioxide) 8.PT08. S3 hourly averaged concentration of tungsten oxide nominally NOx targeted. 9.PT08.S4: hourly averaged concentration of tungsten oxide 10.PT08.S5: hourly averaged sensor response of indium oxide. 11.Temperature in °C 12.Relative humidity (in percentage) 13.AH absolute humidity Data mining techniques: In the data mining techniques the two methods are used for forecasting the temperature of air in degree Celsius and the relative humidity in percentage. In one case the temperature is predicted in terms of other variables from CO(GT),PT08S1(CO),NMHC(GT),C6H6(GT),PT08S2(NMHC),NOx(GT),PT08S3(NOx),N O2(GT),PT08S4(NO2),PT08S5(O3) as predictors by using linear regression. In the other method the temperature and humidity is predicted by time series furcating method where date is taken as the only independent variable. Now, after loading the csv file which contains the air quality data, the entire set is converted to arff format for analysis and in the pre-processing the variable time is removed as it hour is not needed for time series forecasting as the period of data is set for hour in Weka. Now, the
7TIME SERIES FORECASTING OF AIR-QUALITY DATA C6H6(GT) PT08S2(NMHC) NOx(GT) PT08S3(NOx) NO2(GT) PT08S4(NO2) PT08S5(O3) T RH Test mode:10-fold cross-validation === Classifier model (full training set) === Linear Regression Model T = 0.0046 * CO(GT) + 0.0269 * PT08S1(CO) +
10TIME SERIES FORECASTING OF AIR-QUALITY DATA Total Number of Instances9357 Linear regression prediction of Relative humidity: === Run information === Scheme:weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -additional-stats - num-decimal-places 4 Relation:AirQualityUCI-weka.filters.unsupervised.attribute.Remove- weka.filters.unsupervised.attribute.Remove-R1-weka.filters.unsupervised.attribute.Remove- R13 Instances:9357 Attributes:12 CO(GT) PT08S1(CO) NMHC(GT) C6H6(GT) PT08S2(NMHC) NOx(GT) PT08S3(NOx) NO2(GT) PT08S4(NO2) PT08S5(O3)
11TIME SERIES FORECASTING OF AIR-QUALITY DATA T RH Test mode:10-fold cross-validation === Classifier model (full training set) === Linear Regression Model RH = 0.0098 * CO(GT) + 0.1022 * PT08S1(CO) + -0.0194 * NMHC(GT) + -0.1039 * PT08S2(NMHC) + 0.0358 * NOx(GT) + 0.0331 * PT08S3(NOx) + -0.0697 * NO2(GT) + 0.0046 * PT08S4(NO2) + 0.0284 * PT08S5(O3) + 0.6921 * T + -46.6142 Regression Analysis: VariableCoefficientSE of Coeft-Stat
12TIME SERIES FORECASTING OF AIR-QUALITY DATA CO(GT)0.00980.00323.0617 PT08S1(CO)0.10220.002344.8339 NMHC(GT)-0.01940.0016-12.3073 PT08S2(NMHC)-0.10390.0023-45.8413 NOx(GT)0.03580.001720.5461 PT08S3(NOx)0.03310.001227.9508 NO2(GT)-0.06970.0031-22.7652 PT08S4(NO2)0.00460.0014.4514 PT08S5(O3)0.02840.001321.7672 T0.69210.014647.3266 const-46.61422.397-19.4471 Degrees of freedom = 9346 R^2 value = 0.8832 Adjusted R^2 = 0.88309 F-statistic = 7067.9048 Time taken to build model: 0.04 seconds === Cross-validation === === Summary === Correlation coefficient0.9396 Mean absolute error14.1209
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
13TIME SERIES FORECASTING OF AIR-QUALITY DATA Root mean squared error17.5274 Relative absolute error56.1039 % Root relative squared error34.2189 % Total Number of Instances9357 Time series model output for Temperature prediction: === Run information === Scheme: LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4 Lagged and derived variable options: -F T -L 1 -M 24 -am-pm Relation:AirQualityUCI Instances:9357 Attributes:14 Date CO(GT) PT08S1(CO) NMHC(GT) C6H6(GT) PT08S2(NMHC) NOx(GT)
14TIME SERIES FORECASTING OF AIR-QUALITY DATA PT08S3(NOx) NO2(GT) PT08S4(NO2) PT08S5(O3) T RH AH Transformed training data: T ArtificialTimeIndex Lag_T-1 Lag_T-2 Lag_T-3 Lag_T-4 Lag_T-5 Lag_T-6 Lag_T-7 Lag_T-8 Lag_T-9 Lag_T-10
15TIME SERIES FORECASTING OF AIR-QUALITY DATA Lag_T-11 Lag_T-12 Lag_T-13 Lag_T-14 Lag_T-15 Lag_T-16 Lag_T-17 Lag_T-18 Lag_T-19 Lag_T-20 Lag_T-21 Lag_T-22 Lag_T-23 Lag_T-24 ArtificialTimeIndex^2 ArtificialTimeIndex^3 ArtificialTimeIndex*Lag_T-1 ArtificialTimeIndex*Lag_T-2 ArtificialTimeIndex*Lag_T-3 ArtificialTimeIndex*Lag_T-4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
19TIME SERIES FORECASTING OF AIR-QUALITY DATA Target1-step-ahead 2-steps-ahead 3-steps-ahead 4-steps-ahead 5-steps-ahead 6-steps-ahead 7-steps-ahead 8-steps-ahead 9-steps-ahead 10-steps-ahead ================================================================== ================================================================== =========================================== T N933393329331933093299328 9327932693259324 Mean absolute error2.28163.77355.14416.3197.3198 8.17488.94659.629610.178310.6632 Root mean squared error12.60216.591619.6922.07723.9484 25.479926.860128.102129.123530.0639 Total number of instances: 9357 Graph of Actual temperature and predicted values:
20TIME SERIES FORECASTING OF AIR-QUALITY DATA Future predicted values at instants: 9358*28.045 9359*27.4022 9360*26.403 9361*25.4996 9362*24.8303 9363*24.4292 9364*24.198 9365*24.1812 9366*24.3394 9367*24.5905 Graph of future forecasted values with predicted values: Time series model output for Relative humidity prediction:
21TIME SERIES FORECASTING OF AIR-QUALITY DATA === Run information === Scheme: LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4 Lagged and derived variable options: -F RH -L 1 -M 24 -am-pm Relation:AirQualityUCI Instances:9357 Attributes:14 Date CO(GT) PT08S1(CO) NMHC(GT) C6H6(GT) PT08S2(NMHC) NOx(GT) PT08S3(NOx) NO2(GT)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
28TIME SERIES FORECASTING OF AIR-QUALITY DATA Mean absolute error4.45247.36739.93812.146714.032 15.602216.925918.01218.832919.4241 Root mean squared error14.694319.815823.837627.012329.5491 31.586733.40835.006436.312637.417 Total number of instances: 9357 Relative humidity actual and predicted data plot: Future forecast of relative humidity plot with current prediction: CO(GT) prediction using time series:
29TIME SERIES FORECASTING OF AIR-QUALITY DATA === Run information === Scheme: LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4 Lagged and derived variable options: -F CO(GT) -L 1 -M 24 -am-pm Relation:AirQualityUCI Instances:9357 Attributes:14 Date CO(GT) PT08S1(CO) NMHC(GT) C6H6(GT) PT08S2(NMHC) NOx(GT) PT08S3(NOx) NO2(GT)
30TIME SERIES FORECASTING OF AIR-QUALITY DATA PT08S4(NO2) PT08S5(O3) T RH AH Transformed training data: CO(GT) ArtificialTimeIndex Lag_CO(GT)-1 Lag_CO(GT)-2 Lag_CO(GT)-3 Lag_CO(GT)-4 Lag_CO(GT)-5 Lag_CO(GT)-6 Lag_CO(GT)-7 Lag_CO(GT)-8 Lag_CO(GT)-9 Lag_CO(GT)-10
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
34TIME SERIES FORECASTING OF AIR-QUALITY DATA -0* ArtificialTimeIndex^3 + 0* ArtificialTimeIndex*Lag_CO(GT)-1 + -0* ArtificialTimeIndex*Lag_CO(GT)-2 + -0* ArtificialTimeIndex*Lag_CO(GT)-4 + 0* ArtificialTimeIndex*Lag_CO(GT)-18 + 0* ArtificialTimeIndex*Lag_CO(GT)-20 + 0* ArtificialTimeIndex*Lag_CO(GT)-21 + 0* ArtificialTimeIndex*Lag_CO(GT)-22 + 0* ArtificialTimeIndex*Lag_CO(GT)-23 + -0* ArtificialTimeIndex*Lag_CO(GT)-24 + 0.0141 Actual vs predicted CO(GT): Future forecast along with current CO(GT):
35TIME SERIES FORECASTING OF AIR-QUALITY DATA Nox(GT) prediction using time series: === Run information === Scheme: LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4 Lagged and derived variable options: -F NOx(GT) -L 1 -M 24 -am-pm Relation:AirQualityUCI Instances:9357 Attributes:14 Date CO(GT) PT08S1(CO)
36TIME SERIES FORECASTING OF AIR-QUALITY DATA NMHC(GT) C6H6(GT) PT08S2(NMHC) NOx(GT) PT08S3(NOx) NO2(GT) PT08S4(NO2) PT08S5(O3) T RH AH Transformed training data: NOx(GT) ArtificialTimeIndex Lag_NOx(GT)-1 Lag_NOx(GT)-2 Lag_NOx(GT)-3 Lag_NOx(GT)-4
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
41TIME SERIES FORECASTING OF AIR-QUALITY DATA Future predicted data: 9358*254.0032 9359*247.2432 9360*252.6842 9361*255.9062 9362*282.9543 9363*297.4384 9364*280.8169 9365*265.5783 9366*232.8939 9367*177.9536 Nox(GT) predicted vs actual: Future Nox(GT) along with current data:
42TIME SERIES FORECASTING OF AIR-QUALITY DATA The linear regression results show that the adjusted R^2 value for temperature model is 0.93368 or 93.368% is explained by its predictors and in the relative humidity model the over 88% of variation is explained by the independent variable. Hence, the linear regression is a good fit is to both temperature and relative humidity. However, from the time series prediction it can be seen that the mean absolute errors just after the given timeframe of data is less and the error increases as the steps increases. Also, the actual and the predicted temperature, humidity, CO concentration and Nox concentration are very close in the above plots. Thus time series model is also a good fit for the data (Gayathridevi, Karthika and Marikkannan 2018). Additionally, the benefit of time series is that only the date variable is needed for the prediction related to air quality variables and thus this gives an advantage to predict the data with fewer variable with good accuracy or fewer computational time and complexity (Usha and Balamurugan 2016). Furthermore, the confidence intervals of the predictions at each hours or steps in future with probable errors are given in predictions and thus this gives user to extract the future air quality data based on desired accuracy limit. Hence, the time series forecasting for the given air quality data can be considered as the better model for prediction.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
43TIME SERIES FORECASTING OF AIR-QUALITY DATA Conclusions: In conclusion it can be stated that the main objectives of this particular assignment has been successfully completed as both the primary properties of air quality data, temperature and humidity collected from the UCI machine learning repository are successfully predicted for near future hours. It is found from the data mining that the time series forecasting model is better model for prediction of air quality data than the linear regression method due to its accuracy with just one variable which is date frame of data in successive hours. The time series model can be used for predicted future temperature, humidity and other two gas concentration in the air just after the given sample time frame with good confidence level. However, the model is not valid for same timeframe in different cities or countries as the sample data is collected from a region of Italian city by using the Air Quality Chemical Multisensory Device.