Federation University: Big Data and Analytics Assignment 1 Report

Verified

Added on  2021/05/31

|18
|3528
|41
Report
AI Summary
This report presents an analysis of a big data set related to energy consumption, utilizing the IBM Watson Analytics tool. The study, based on data from the Federation University's Solar Cities project, investigates various factors influencing power usage, including CFL counts, halogen counts, LED counts, and power usage trends over time. Key findings include the highest CFL count observed in October, a linear relationship between LED and CFL counts, and the identification of factors like suburban type and living room count as significant predictors in a model for CFL count. The report also explores recommendations for optimizing energy use and reducing CO2 emissions, emphasizing the importance of efficient electrical machines, CFL usage, and minimizing coal energy consumption. Furthermore, it highlights the need for predictive models incorporating renewable energy sources such as nuclear, wind, solar, and biomass energy, along with the importance of considering various factors to forecast future energy demand and emissions.
Document Page
Big Data and Analytics
Assignment 1- Data Analysis
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Table of Contents
Abstract................................................................................................................................3
Introduction..........................................................................................................................3
Background..........................................................................................................................3
Dashboard/report..................................................................................................................4
Research...............................................................................................................................5
Recommendations................................................................................................................4
Reflection.............................................................................................................................5
References............................................................................................................................5
2 | P a g e
Document Page
Big Data and Analytics
Assignment 1
Data Analysis
Abstract
The research study is arranged for the analysis of big data by using the IBM Watson Data
analytics tool. From this analysis, it is observed that the highest CFL count is noted in the month
of October, median CFL count is observed in the month of March, while lowest CFL count is
observed in the month of November. It is observed that the CFL count for the estimated age of
sixty and over is highest and it is given as 190k. There is a 55% growth in the CFL count from
the year 2012 to year 2015. It is observed that 2458 is the lowest total bathrooms by estimated
age fifteen to nineteen. The top CFL count is observed for the Dark colour. The highest total
flour count is observed in the month of October and it is given as 22.8k. Halogen count is
observed highest in the month of October and it is observed lowest in the month of February.
The median Halogen count is observed for the month of March. For optimization of the energy
use, we need to implement several things such as use of efficient and modified electrical
machines, use of CFL lights, etc. For the reduction in CO2 emissions Coal energy consumption
should be minimized, because coal energy consumption produce CO2 emissions in a large
proportion. The predictive model for the future energy use and CO2 emissions should include the
nuclear energy, wind energy, solar energy, biomass energy, etc. Power usage is increasing from
the year 2012 to year 2014 and again it decreases from year 2014 to year 2015. A linear
relationship exists between the LED count and CFL count. Most affected factors for the
prediction model are observed as suburban type, size, incandescent count, and living rooms.
Introduction
We know that the analysis of different data sets is required for taking decisions regarding the
business, management, etc. Now a day, industries and businesses generates a big data and
analysis of these big data sets is required for understanding the characteristics of the production
or service. For the analysis of these types of big data sets, we need to use different statistical
tools and techniques for the analysis. It becomes necessary to analyse the data from different
industries for making effective decisions. Also, this data analysis provides the proper estimates
for future use. Here, we have to analyse one such a big data set by using the IBM Watson
analytics tool. This data set is related to power use or energy consumption by different types of
3 | P a g e
Document Page
users. Statistical data analysis plays an important role in this new era of businesses and
industries. It is important to use different statistical software’s for the analysis of big data. For
optimization of the energy use, we need to implement several things such as use of efficient and
modified electrical machines, use of CFL lights, etc. For the reduction in CO2 emissions Coal
energy consumption should be minimized, because coal energy consumption produce CO2
emissions in a large proportion. The predictive model for the future energy use and CO2
emissions should include the nuclear energy, wind energy, solar energy, biomass energy, etc.
Background
The Federation University conduct a Solar Cities project for study of consumption of energy.
This project involved the recruitment of the different households and businesses across the
Loddon Mallee and Grampians regions. During this research study, changes in energy
consumption were monitored by the researchers. Researchers find out all related factors which
affects the energy consumption. Researchers also find out the relationship exists between the
energy consumption and different variables that could influence energy consumption. These
possible factors were divided into set of their features. Then researchers were taken the
measurements for these factors. Given data set includes the sets of features such as adoption of
solar energy technologies, geographic characteristics, physical characteristics of the dwellings,
including such things as the dwellings age, size, number of stories , number of lights, insulation,
etc. The main goal of this research study or project is to understand the drivers of power
consumption, For this research study, researchers wants to find out the combination of features
which could useful in the reduction of energy consumption. Also, researchers want to predict the
model for future demand of energy use and CO2 emissions. Here, we have to study different
patterns of energy uses and CO2 emissions for the given data set. Also, we will develop a
predictive model for future energy use by using the IBM Watson Analytics tool. We have to
analyse entire data set by using IBM Watson Analytics tool and then we have to make some
discoveries. We have to study any useful facts from this data set, interesting insights, trends, and
patterns regarding the energy consumption.
Dashboard/Report
In this section we have to analyse the given big data set by using IBM Watson Analytics tool.
Given data set for the energy consumption have different variables and the list of these variables
is summarised as below:
Variable 1
SUBURB
4 | P a g e
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data type is text.
Victorian Suburb names of the houses chosen for the study
Possible values: Portland, Narrawong, Heywood, Tyrendarra, Sandford, Digby, Myamyn,
Condah, Casterton, Heathmeare, Drumborg, Allestree, Bolwarra, Nelson, Bahgallah, Heathmere,
Dartmoo
Variable 2
TENURE
Data type is text.
The status of the property in terms of how people are living in it
Possible values: "OWNED, RENTED, MORTGAGED, OTHER, RENT_FREE,
LIFE_TENURE, UNKNOWN"
Variable 3
Estimated Age
The estimated age of the property as ordinal categorical intervals
Variable 4
Wall construction
The type of material and/or construction type used for the dwelling walls
List of remaining variables that are included in this research study is given as below:
Field Name Data Type Definition
ROOF_COLOUR Text
The color of the roof, to test the absorption or
reflection of the sun
STORIES Integer
A count of the number of stories that the
dwelling has, only 1 or 2 story dwellings
recorded in this study
BEDROOMS Integer
A count of the number of Bedrooms that the
dwelling has. 99 signifies a missing count.
BATHROOMS Integer
A count of the number of Bathrooms that the
dwelling has. 99 signifies a missing count.
LIVING_ROOMS Integer
A count of the number of Living rooms that the
dwelling has. 99 signifies a missing count.
5 | P a g e
Document Page
SIZE_SQM Integer
An approximate size area of the dwelling, in 6
different sizes. 0 signifies a missing
measurement.
WINDOW_TYPE Text The physical structure of the glass
WINDOW_COVERINGS Text What type of covering over the windows if any
STRUCTURE Text What type of dwelling it is
CFL COUNT Integer Number of compact fluorescent lamps
HALOGEN_COUNT Integer Number of Halogen lights in the dwelling
LED_COUNT Integer Number of LED lights in the dwelling
INCANDESCENT_COUNT Integer Number of Incandescent lights in the dwelling
FLUOR_COUNT Integer Number of fluorscent lights in the dwelling
INSULATION integer
Where the insulation is situated
0 no insulation or unknown
1 ceiling only
2 wall and ceiling
3 wall, ceiling and floor
PV_CAPACITY
The amount of power being created by solar
PV panels.
INTERVAL_DATE TEXT Date of power meter reading, collected daily
POWER_USAGE Decimal
Amount of power being consumed on the
given day
Now, we have to analyse this data set by using the IBM Watson Analytics tool. By using this
tool, some of the discoveries were made which are presented below:
First of all we have to discover the top CFL count by the estimated age and analysis is given as
below:
6 | P a g e
Document Page
From this analysis, it is observed that the CFL count for the estimated age of sixty and over is
highest and it is given as 190k. Bar graph for the CFL count by month indicates that the highest
CFL count is noted in the month of October, median CFL count is observed in the month of
March, while lowest CFL count is observed in the month of November. It is observed that there
is a 55% growth in the CFL count from the year 2012 to year 2015. This means, the energy
7 | P a g e
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
consumption is increasing rapidly and it is important to use other sources of energy such as solar,
wind, etc.
Now, we have to analyse the given data set for the CFL count by the roof colour. The IBM
Watson discovery is given as below:
It is observed that 2458 is the lowest total bathrooms by estimated age fifteen to nineteen. The
top CFL count is observed for the Dark colour. The highest total flour count is observed in the
month of October and it is given as 22.8k.
8 | P a g e
Document Page
The IBM Watson discovery for the top CFL count by stories is given as below:
From this discovery or research study, it is observed that the Halogen count is observed highest
in the month of October and it is observed lowest in the month of February. The median Halogen
count is observed for the month of March.
Now, we have to see the contribution of the power usage over the given years by a roof colour.
The IBM Watson discovery for this analysis is summarised as below:
9 | P a g e
Document Page
From above IBM Watson output, it is observed that the power usage is increasing from the year
2012 to year 2014 and again it decreases from year 2014 to year 2015. This means, after the year
2014, there is a significant decrement is observed in the power usage.
Now, we have to see the relationship exists between the LED count and CFL count by the year.
Required output is given as below:
10 | P a g e
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
From above output, it is observed that there is some linear relationship exists between the LED
count and CFL count.
Now, we have to see the predictive model for the CFL count by using IBM Watson analytics
tool. Required output for this predictive model is given as below:
From the above output, it is observed that the predictive value for CFL count is varies as per the
different values for the different suburban, size, incandescent count, living rooms, etc. So, these
factors are important in prediction of the CFL count.
There are so many combinations of features available for the reduction in energy consumption.
The first main combination of features is to use of efficient and modified electrical machines
which consume low energy. Also, use of CFL, LED will be helpful in reducing power
consumption. It is important to take significant actions for reduction in energy consumption.
There would be list of several do’s and don’ts for the reduction in energy consumption. For
11 | P a g e
Document Page
example, one may suggest a low use of AC in the rainy or winter season. The local authorities
and different government organizations should take proper actions for the awareness of people
regarding the low energy consumption. There are several factors which would explain the
demand on future energy use and CO2 emissions. For optimization of the energy use, we need to
implement several things such as use of efficient and modified electrical machines, use of CFL
lights, etc. For the reduction in CO2 emissions Coal energy consumption should be minimized,
because coal energy consumption produce CO2 emissions in a large proportion. The predictive
model for the future energy use and CO2 emissions should include the nuclear energy, wind
energy, solar energy, biomass energy, etc. If proper precautions were taken, then use of nuclear
power is a better alternative for the coal energy or other forms of energy. If nuclear power plants
will be used with proper care, then there is a possibility of reduction in CO2 emissions. Due to
previous accidents with nuclear power plants, peoples are not in favour of these nuclear plants.
Several factors are needed to predict the future energy use and CO2 emissions. The significance
of these factors should be test and if factor found significant, then it would be include in the
predictive model.
Research
In this section, we have to analyse different results from the IBM Watson Data analysis tool. We
analyse the power consumption data set by using IBM Watson and finds out different
discoveries. We analyse different variables by year, count, consumption, etc. some of the
research points from this research study are summarised as below:
a. It is observed that the CFL count for the estimated age of sixty and over is highest and it
is given as 190k.
b. From this IBM Watson Data analysis, it is observed that the highest CFL count is noted
in the month of October, median CFL count is observed in the month of March, while
lowest CFL count is observed in the month of November.
c. It is observed that there is a 55% growth in the CFL count from the year 2012 to year
2015. This means, the energy consumption is increasing rapidly and it is important to use
other sources of energy such as solar, wind, etc.
d. It is observed that 2458 is the lowest total bathrooms by estimated age fifteen to nineteen.
The top CFL count is observed for the Dark colour. The highest total flour count is
observed in the month of October and it is given as 22.8k.
e. It is observed that the Halogen count is observed highest in the month of October and it is
observed lowest in the month of February. The median Halogen count is observed for the
month of March.
f. It is observed that the power usage is increasing from the year 2012 to year 2014 and
again it decreases from year 2014 to year 2015. This means, after the year 2014, there is a
significant decrement is observed in the power usage.
12 | P a g e
chevron_up_icon
1 out of 18
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]