Data Analytics: SAP HANA Data Modeling and Association Rule Data Mining
VerifiedAdded on 2022/11/13
|22
|3791
|170
AI Summary
This paper discusses the use of SAP HANA for data analytics and explores data modeling, data provisioning, and data mining techniques. It covers types of attributes, measures, and modeling objects, as well as the uses of association rules in data mining. The importance of datasets in machine learning is also discussed.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
DATA ANALYTICS
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Contents
1. Introduction.......................................................................................................................................2
2. Background of the Project................................................................................................................2
3. Create Data View/Cube-Data Modelling and Provision.................................................................2
4. SAP HANA Data Modeling...............................................................................................................4
Types of Attributes...............................................................................................................................5
Types of measures................................................................................................................................5
Types of Modeling Objects........................................................................................................................6
5. Perform Data Mining Techniques....................................................................................................7
6. ASSOCIATION RULE DATA MINING:.......................................................................................8
Association rule algorithms...............................................................................................................8
Uses of association rules in data mining...........................................................................................9
7. Dataset................................................................................................................................................9
8. Research...........................................................................................................................................10
9. Important results of the above analysis and Recommendations to CEO....................................19
10. Conclusion....................................................................................................................................19
References............................................................................................................................................20
1. Introduction.......................................................................................................................................2
2. Background of the Project................................................................................................................2
3. Create Data View/Cube-Data Modelling and Provision.................................................................2
4. SAP HANA Data Modeling...............................................................................................................4
Types of Attributes...............................................................................................................................5
Types of measures................................................................................................................................5
Types of Modeling Objects........................................................................................................................6
5. Perform Data Mining Techniques....................................................................................................7
6. ASSOCIATION RULE DATA MINING:.......................................................................................8
Association rule algorithms...............................................................................................................8
Uses of association rules in data mining...........................................................................................9
7. Dataset................................................................................................................................................9
8. Research...........................................................................................................................................10
9. Important results of the above analysis and Recommendations to CEO....................................19
10. Conclusion....................................................................................................................................19
References............................................................................................................................................20
1. Introduction
The study of investigating patterns in large dataset is called Data analysis. In this paper we use the SAP
HANA state of the art tool for the purpose of analyzing data and reporting results. This tool is a Web
Application that is very commonly used for the analyses of huge datasets. Many researches have used
this tool and that is why it is also our choice of tool for our research.
2. Background of the Project
In comma divided value format, the information representing social procurement for the European Area,
Switzerland, and the previously Yugoslav Republic of Macedonia from (2006- 01- 01) to (2018- 12- 31)
is represented by a subgroup of Tenders Electronic Daily (TED). It involves the most significant areas
from the conventional forms of contract and reward notification, for example, who purchased what from
whom, in what amount, and which standards were used for the operation and reward (Ramsay &
Silverman, 2013). The information typically comprises of contracts above the thresholds of acquisition.
Nevertheless, publication of under-threshold contracts in TED is regarded as excellent practice and
therefore there are significant amount of under-threshold contracts.
3. Create Data View/Cube-Data Modelling and Provision
SAP HANA
SAP HANA is a framework for in-memory database and application production for real-time handling of
huge amounts of information. Being the best product of SAP, it allows data analysts in real time
questioning of that information. HANA's in-memory computation database software liberates analysts
from needing to import or record information. It also contains a programming element that allows the IT
office of a company to generate and operate custom software programs in addition to HANA, in addition
to a set of indicative, visual and text analysis databases across various information sources. Since it can
work with a source SAP ERP application, analysts can access real-time data for real-time processing and
The study of investigating patterns in large dataset is called Data analysis. In this paper we use the SAP
HANA state of the art tool for the purpose of analyzing data and reporting results. This tool is a Web
Application that is very commonly used for the analyses of huge datasets. Many researches have used
this tool and that is why it is also our choice of tool for our research.
2. Background of the Project
In comma divided value format, the information representing social procurement for the European Area,
Switzerland, and the previously Yugoslav Republic of Macedonia from (2006- 01- 01) to (2018- 12- 31)
is represented by a subgroup of Tenders Electronic Daily (TED). It involves the most significant areas
from the conventional forms of contract and reward notification, for example, who purchased what from
whom, in what amount, and which standards were used for the operation and reward (Ramsay &
Silverman, 2013). The information typically comprises of contracts above the thresholds of acquisition.
Nevertheless, publication of under-threshold contracts in TED is regarded as excellent practice and
therefore there are significant amount of under-threshold contracts.
3. Create Data View/Cube-Data Modelling and Provision
SAP HANA
SAP HANA is a framework for in-memory database and application production for real-time handling of
huge amounts of information. Being the best product of SAP, it allows data analysts in real time
questioning of that information. HANA's in-memory computation database software liberates analysts
from needing to import or record information. It also contains a programming element that allows the IT
office of a company to generate and operate custom software programs in addition to HANA, in addition
to a set of indicative, visual and text analysis databases across various information sources. Since it can
work with a source SAP ERP application, analysts can access real-time data for real-time processing and
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
they don't have to queue for a regular or weekly report ("SAP HANA server infrastructure with Power
Systems", 2019).
SAP HANA FUNDAMENTALS
Function of Hana (SILVIA, 2016) ("What is SAP HANA | In Memory Computing and Real Time
Analytics", 2019) in:
o Financial affairs (Does not require merging of FI & CO with a single row object table as
an only source of data)
o Development plan (Lesser time for MRP and no time consuming jobs)
o Stockpile management (Real time warnings, decreased security inventory)
o Acquisition (Real time statistics of procurement KPI)
o Product assessment (multi capital uptake etc)
o Financial period closure (quicker reporting of outcomes for interim analysis etc)
o Informing business activities (Real time with indicative and simulation analysis)
Fiori (Instinctive knowledge for users)
o Salesperson order handling (Sales order completion dashboard for problem detection and
resolution)
o Acquisition for procurement clerk (Dashboard for all information of purchased closings)
o Structural planner’s planning (Predict supply options)
Unification of applications to core ECC
o Capabilities of current apps (such as SCM, CRM, etc.) and recently purchased cloud
alternatives (such as Ariba, Success factor, etc.) are accessible in ECC (such as ‘Digital
core’)
Systems", 2019).
SAP HANA FUNDAMENTALS
Function of Hana (SILVIA, 2016) ("What is SAP HANA | In Memory Computing and Real Time
Analytics", 2019) in:
o Financial affairs (Does not require merging of FI & CO with a single row object table as
an only source of data)
o Development plan (Lesser time for MRP and no time consuming jobs)
o Stockpile management (Real time warnings, decreased security inventory)
o Acquisition (Real time statistics of procurement KPI)
o Product assessment (multi capital uptake etc)
o Financial period closure (quicker reporting of outcomes for interim analysis etc)
o Informing business activities (Real time with indicative and simulation analysis)
Fiori (Instinctive knowledge for users)
o Salesperson order handling (Sales order completion dashboard for problem detection and
resolution)
o Acquisition for procurement clerk (Dashboard for all information of purchased closings)
o Structural planner’s planning (Predict supply options)
Unification of applications to core ECC
o Capabilities of current apps (such as SCM, CRM, etc.) and recently purchased cloud
alternatives (such as Ariba, Success factor, etc.) are accessible in ECC (such as ‘Digital
core’)
4. SAP HANA Data Modeling
DATA Provisioning
Data provisioning is another area of interest that we have explored in our research work. In Data
provisioning we create and preparing a network and then configure it to provide required information to
the users. Once all of the data has been processed it will be ready for loading into the HANA software
where the user can access it easily and use it as he wishes
There is a name given to these processes. They are known with the titles of Extract Transform Load, also
abbreviated as ETL. In extraction we take out the data using various sources of raw data. In the
transformation phase the data is made readily usable to the user. This is done in accordance with a set
of rules laid down that are standard processes of transformation of data. The toughest part is extraction
because the data sources are not always reliable.
Duplicating data in SAP HANA:
The duplication of data in this software is commonly done in 2 methods.
- Using the already existing libraries of HANA to have a collection of flat files or data streams.
This supports multiple formats such as .cs, .xls and .xlsx, among others
- Making use of the services that are included in the SAP package which help in extraction in ETL
and weaver platform when it is run. The other service that can be used is SLT which is especially
useful for some platforms like weaver
It displays data from the databases as a business design. With this, we can make an information model,
which could be used in reporting and analytical application, for example SAP Lumira, Webi etc. These
type of models can be made by processing and modification of information from data sources.
It’s modeling is done by the SAP HANA Studio Modeler and is enforced on the database layer so that it
can later be used by the application layer without having to go through many stages. This saves a large
DATA Provisioning
Data provisioning is another area of interest that we have explored in our research work. In Data
provisioning we create and preparing a network and then configure it to provide required information to
the users. Once all of the data has been processed it will be ready for loading into the HANA software
where the user can access it easily and use it as he wishes
There is a name given to these processes. They are known with the titles of Extract Transform Load, also
abbreviated as ETL. In extraction we take out the data using various sources of raw data. In the
transformation phase the data is made readily usable to the user. This is done in accordance with a set
of rules laid down that are standard processes of transformation of data. The toughest part is extraction
because the data sources are not always reliable.
Duplicating data in SAP HANA:
The duplication of data in this software is commonly done in 2 methods.
- Using the already existing libraries of HANA to have a collection of flat files or data streams.
This supports multiple formats such as .cs, .xls and .xlsx, among others
- Making use of the services that are included in the SAP package which help in extraction in ETL
and weaver platform when it is run. The other service that can be used is SLT which is especially
useful for some platforms like weaver
It displays data from the databases as a business design. With this, we can make an information model,
which could be used in reporting and analytical application, for example SAP Lumira, Webi etc. These
type of models can be made by processing and modification of information from data sources.
It’s modeling is done by the SAP HANA Studio Modeler and is enforced on the database layer so that it
can later be used by the application layer without having to go through many stages. This saves a large
amount of time and supplies. Modeling equipment can be made in the modeler where data will be
processed form the database according to the framework of the model. Models created on this level make
use of the processing innovations and characteristics of multi-core CPU’s, which can later be executed.
A SAP HANA data model can be crated in the SAP HANA Studio. The database and tables can be
accessed in the Catalog tab. After the model with information views has been created, all those views can
be accessed in the Content tab. These views are present in a package named categories, depending on the
view types. The data tables have been designed differently in a view, particularly as dimension and fact
tables.
Attributes are the explanatory data containing information about the data used in the tables. The attributes
are therefore data features like Country, Store, Sales ID etc. Attributes are the kinds of data that are
immeasurable and can not be used in calculations.
But measures are data entries which are both measurable and calculable.
The measures that are used by the views can be helpful for analytical requirements.
Types of Attributes
Basic Attributes – Can be derived from the date source.
Calculated Attributes – Created from residing source attributes. For example, full name,
created from two attributes, first name and last name.
Private Attributes – Used in modeling data in views. Once these attributes are taken as private
in a view, they can be used in the specific view only.
Types of measures
Basic Measure - Taken from the source table in their original state.
Calculated Measure – Created from a combination of two measures from OLAP, cubes,
constants etc. For example, Profit = Sales price – Cost price.
processed form the database according to the framework of the model. Models created on this level make
use of the processing innovations and characteristics of multi-core CPU’s, which can later be executed.
A SAP HANA data model can be crated in the SAP HANA Studio. The database and tables can be
accessed in the Catalog tab. After the model with information views has been created, all those views can
be accessed in the Content tab. These views are present in a package named categories, depending on the
view types. The data tables have been designed differently in a view, particularly as dimension and fact
tables.
Attributes are the explanatory data containing information about the data used in the tables. The attributes
are therefore data features like Country, Store, Sales ID etc. Attributes are the kinds of data that are
immeasurable and can not be used in calculations.
But measures are data entries which are both measurable and calculable.
The measures that are used by the views can be helpful for analytical requirements.
Types of Attributes
Basic Attributes – Can be derived from the date source.
Calculated Attributes – Created from residing source attributes. For example, full name,
created from two attributes, first name and last name.
Private Attributes – Used in modeling data in views. Once these attributes are taken as private
in a view, they can be used in the specific view only.
Types of measures
Basic Measure - Taken from the source table in their original state.
Calculated Measure – Created from a combination of two measures from OLAP, cubes,
constants etc. For example, Profit = Sales price – Cost price.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Restricted Measure – Selected measure values due to an applied condition placed on an
attribute. For example, displaying the measure values for a gross revenue on a pecific car in USA.
Counter – Kind of column in a calculation or analytical view. It displays the number of attribute
columns.
Types of Modeling Objects
Attribute View
Analytic View
Calculation View
Decision Table
SAP HANA Data Provision:
DATA Provisioning is a method that creates, prepares and enables a network to provide its customer
with information. Before information hits the customer through a front-end instrument, information must
be added to SAP HANA.
It works with information reproduction in the HANA database to be used in modeling of HANA and to
be gobbled up by reporting instruments. Different techniques of information provisioning are endorsed
in information reproduction of SAP HANA scheme.
Its replication enables information to be migrated to the SAP HANA database from source devices. An
easy way to transfer information from the current SAP scheme to HANA is through the use of different
methods for information reproduction.
You can set up system replication via command line on the computer or by using HANA studio. During
this phase, the main ECC or transaction schemes can remain online. The HANA scheme contains three
kinds of information replication techniques –
attribute. For example, displaying the measure values for a gross revenue on a pecific car in USA.
Counter – Kind of column in a calculation or analytical view. It displays the number of attribute
columns.
Types of Modeling Objects
Attribute View
Analytic View
Calculation View
Decision Table
SAP HANA Data Provision:
DATA Provisioning is a method that creates, prepares and enables a network to provide its customer
with information. Before information hits the customer through a front-end instrument, information must
be added to SAP HANA.
It works with information reproduction in the HANA database to be used in modeling of HANA and to
be gobbled up by reporting instruments. Different techniques of information provisioning are endorsed
in information reproduction of SAP HANA scheme.
Its replication enables information to be migrated to the SAP HANA database from source devices. An
easy way to transfer information from the current SAP scheme to HANA is through the use of different
methods for information reproduction.
You can set up system replication via command line on the computer or by using HANA studio. During
this phase, the main ECC or transaction schemes can remain online. The HANA scheme contains three
kinds of information replication techniques –
• SAP Landscape Transformation (SLT) Replication
• ETL tool SAP Business Object Data Service (BODS)
• Direct Extractor Connection (DXC)
Extract - This is the first and often the hardest portion of ETL that extracts information from the distinct
source scheme.
Transform - Sequence of laws or features are described for the information obtained from the original
scheme, to feed this information into the destination scheme in the Transformation Part.
Load – It loads the data in the target system.
5. Perform Data Mining Techniques
The process of taking data and performing analysis on it in order to be able to categorize it into
different categories is called data mining. In this process we try to extract useful information
from all of the data so that this data can then be used to understand and make inferences. Some
other terms have also surfaced in the past few years such as knowledge discovery and the
discovery of data. This data mining helps business in becoming more efficient and making
informed decision ahead of time that can help them increase their revenues
Key features of data mining:
There are many key features of data mining, some of which are written below:
Analyzing the trends, leading to prediction of patterns that are automatic
Checking the possible outcomes of data and then being able to state the probablitlty of which one
will occurs (Hair, Black, Babin & Anderson, n.d.) Help in decision making by making decisions
according to the data trends Huge datasets are analyzed that are to be used in databases
Making clusters or groups of data to infer information that was not previously known (Peck &
Devore, 2012).
• ETL tool SAP Business Object Data Service (BODS)
• Direct Extractor Connection (DXC)
Extract - This is the first and often the hardest portion of ETL that extracts information from the distinct
source scheme.
Transform - Sequence of laws or features are described for the information obtained from the original
scheme, to feed this information into the destination scheme in the Transformation Part.
Load – It loads the data in the target system.
5. Perform Data Mining Techniques
The process of taking data and performing analysis on it in order to be able to categorize it into
different categories is called data mining. In this process we try to extract useful information
from all of the data so that this data can then be used to understand and make inferences. Some
other terms have also surfaced in the past few years such as knowledge discovery and the
discovery of data. This data mining helps business in becoming more efficient and making
informed decision ahead of time that can help them increase their revenues
Key features of data mining:
There are many key features of data mining, some of which are written below:
Analyzing the trends, leading to prediction of patterns that are automatic
Checking the possible outcomes of data and then being able to state the probablitlty of which one
will occurs (Hair, Black, Babin & Anderson, n.d.) Help in decision making by making decisions
according to the data trends Huge datasets are analyzed that are to be used in databases
Making clusters or groups of data to infer information that was not previously known (Peck &
Devore, 2012).
Data mining processes are a collection of many steps. We underline some of them below
- The extraction and transformation of the data by loading it into databanks of huge
capacity
- Data is converted into many different dimension based on attributes and then stored
easily
- The analyzers are provided access to all the data so they can analyze and visualize it for
business development
- Once the data is analyzed it is presented in easily visualizable form so that the reader can
understand it really quickly
6. ASSOCIATION RULE DATA MINING:
Quickly one of the branches of data mining is the association rule data mining which is the study
of machine learning techniques and models in order to find recurring occurrences and trends in
the data. And how the data is associated to different patterns and trends. Antecedent and
consequent are the two major parts of the associative properties of datasets. The items found
within the data are termed as antecedent and the output of different combinations of the data are
called consequents. Consequents comes from the word consequences, and its meaning is the
same as the root word as well. It is a consequence of the merging of different types of data
(Berthold & Hand, 2011) ("Challenges with Big Data Analytics", 2015).If looked at closely, you
will find if and else associations in almost all datasets. This means that every dataset contains
associations and a response to some certain variables. This often gives rise to another variable
which is given the term lift.
Association rule algorithms
Many association rules algorithms have been developed in order to help people in the working
and understanding of associative algorithms. The AIS algorithm is one such examples in the
large datasets contain transactions and new item sets that result from the transactions that are
occurring. Another algorithms is the SETM which is an efficient way of using associative data. It
- The extraction and transformation of the data by loading it into databanks of huge
capacity
- Data is converted into many different dimension based on attributes and then stored
easily
- The analyzers are provided access to all the data so they can analyze and visualize it for
business development
- Once the data is analyzed it is presented in easily visualizable form so that the reader can
understand it really quickly
6. ASSOCIATION RULE DATA MINING:
Quickly one of the branches of data mining is the association rule data mining which is the study
of machine learning techniques and models in order to find recurring occurrences and trends in
the data. And how the data is associated to different patterns and trends. Antecedent and
consequent are the two major parts of the associative properties of datasets. The items found
within the data are termed as antecedent and the output of different combinations of the data are
called consequents. Consequents comes from the word consequences, and its meaning is the
same as the root word as well. It is a consequence of the merging of different types of data
(Berthold & Hand, 2011) ("Challenges with Big Data Analytics", 2015).If looked at closely, you
will find if and else associations in almost all datasets. This means that every dataset contains
associations and a response to some certain variables. This often gives rise to another variable
which is given the term lift.
Association rule algorithms
Many association rules algorithms have been developed in order to help people in the working
and understanding of associative algorithms. The AIS algorithm is one such examples in the
large datasets contain transactions and new item sets that result from the transactions that are
occurring. Another algorithms is the SETM which is an efficient way of using associative data. It
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
involves transactions as well and for that reason shares some of the disadvantages of AIS. Both
algorithms create many small subsets that are related to each other and then there is a summation
done at the end again. All this makes the process tedious and not very efficient. All this has been
mentioned in great detail by Dr. SaedSayad in the book he has authored.
Uses of association rules in data mining
It is not unheard of to make usage of the association rules in data mining (Gjerstad, 2008). A very
important role is played by them in data mining and programmers use the rules when they face
difficultly in building machine learning algorithms. For those who are new to machine learning,
it is important to understand that machine learning is a branch of artificial intelligence in which
computers are taught to understand and analyze data and then make inferences from it.
Association rules play a major part in data mining and are used very frequently. The developers
use these rules in development of machine learning algorithms to provide more efficient and
transparent results of data mining
7. Dataset
In machine learning the greatest challenge is the availability of datasets. Without large datasets
our results will always be inaccurate. The data is used in its raw form. Then it is cleaned up for
anomalies and portions that are not of use (Graham, 2011). This data is the stored separately to
avoid overlapping data. It is common for data entry operators to make mistakes while entering
the data so over the year’s people have looked toward technology to provide more efficient data
entry mechanisms. The data is also filtered to keep the relevant data and discard the other. The
number of fields has now been put a limit on in order to stop the overly splitting of data in which
there is the possibility of losing meaningful data. The man to many relationship of the data
amongst each other greatly affects the way matters are influenced. To explain this:
- One notice has lots of information about more than one slots
algorithms create many small subsets that are related to each other and then there is a summation
done at the end again. All this makes the process tedious and not very efficient. All this has been
mentioned in great detail by Dr. SaedSayad in the book he has authored.
Uses of association rules in data mining
It is not unheard of to make usage of the association rules in data mining (Gjerstad, 2008). A very
important role is played by them in data mining and programmers use the rules when they face
difficultly in building machine learning algorithms. For those who are new to machine learning,
it is important to understand that machine learning is a branch of artificial intelligence in which
computers are taught to understand and analyze data and then make inferences from it.
Association rules play a major part in data mining and are used very frequently. The developers
use these rules in development of machine learning algorithms to provide more efficient and
transparent results of data mining
7. Dataset
In machine learning the greatest challenge is the availability of datasets. Without large datasets
our results will always be inaccurate. The data is used in its raw form. Then it is cleaned up for
anomalies and portions that are not of use (Graham, 2011). This data is the stored separately to
avoid overlapping data. It is common for data entry operators to make mistakes while entering
the data so over the year’s people have looked toward technology to provide more efficient data
entry mechanisms. The data is also filtered to keep the relevant data and discard the other. The
number of fields has now been put a limit on in order to stop the overly splitting of data in which
there is the possibility of losing meaningful data. The man to many relationship of the data
amongst each other greatly affects the way matters are influenced. To explain this:
- One notice has lots of information about more than one slots
- The information about various rewards is contained in one single CAN
- Several slots are linked and associated to one award
- It is common to award multiple cases to one single slot
8. Research
The above picture is rule is Association rule .The association rules contain transaction ID column
and Reference ID column and Item column. The Association Parameters are Minimum support
and Minimum confidence and maximum length. The number of rules found value is zero. The
total number of items found value is one.
This picture shows the only dataset parameters.three types of dataset parameters such as
Transaction ID column, Reference ID column, and Item column. Transaction ID column contain
- Several slots are linked and associated to one award
- It is common to award multiple cases to one single slot
8. Research
The above picture is rule is Association rule .The association rules contain transaction ID column
and Reference ID column and Item column. The Association Parameters are Minimum support
and Minimum confidence and maximum length. The number of rules found value is zero. The
total number of items found value is one.
This picture shows the only dataset parameters.three types of dataset parameters such as
Transaction ID column, Reference ID column, and Item column. Transaction ID column contain
TED_NOTICE_URL and Reference ID column contain TED_NOTICE_URL and Item column
contain YEAR
The three types of Training parameters occur in the above picture. The three types Training
parameters are Minimum support and Minimum confidence and Maximum rule length
contain YEAR
The three types of Training parameters occur in the above picture. The three types Training
parameters are Minimum support and Minimum confidence and Maximum rule length
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
The parameter is extracted and Minimum support is 10 and Minimum confidence is 50.item
column is YEAR .The Maximum length is 4.
Dataset values are described in this section .Name values are type of name and storage values are
Integer values or string values and it contain number of nominal and continuous variable and key
values are zero.
column is YEAR .The Maximum length is 4.
Dataset values are described in this section .Name values are type of name and storage values are
Integer values or string values and it contain number of nominal and continuous variable and key
values are zero.
The dataset name is TED_CAN-2018.csv.In this dataset have ID_TYPE, ID_NOTICE_CAN,
DT_DISPATCH, YEAR, TED_NOTICE. The ID_TYPE is integer value and
ID_NOTICE_CAN is year value, TED_NOTICE is string value
DT_DISPATCH, YEAR, TED_NOTICE. The ID_TYPE is integer value and
ID_NOTICE_CAN is year value, TED_NOTICE is string value
The number of variables are described in this section .The variable value is continuous value or
nominal values and storage values are string or integer values. The total row count value is78,
816.The missing value is zero or integer value
nominal values and storage values are string or integer values. The total row count value is78,
816.The missing value is zero or integer value
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The category frequency value is high to low. The chart type is Bar chart’s.x and y variables are
calculated. The x variables are year and y variables are point value. The Blue bar Represent
frequency value
calculated. The x variables are year and y variables are point value. The Blue bar Represent
frequency value
The category frequencies are calculated in this section. The frequency increasing high to
low .The variable name is ID_NOTICE_CAN
low .The variable name is ID_NOTICE_CAN
The category and Frequency values are viewed in this section .The frequency value is percentage
value and the category values are integer value.
value and the category values are integer value.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
The data source is select from the Folder file then click the next button after view the dataset
value .The Event is dataset name. The dataset name is TED_CAN_2018.csv
Cover letter
To Mr Majetic
CEO
Dear CEO
Displayed in this letter is a birds eye view of the various researches conducted on bid data using
the software that is titled SAP HANA. The research was conducted in a very extensive manner at
the TED (tenders electronic daily) investing company based out of Australia. We have included
in this report details about how the data was collected and used for the various findings. In the
report we also mention various recommendation given by the developers, programmers and
researchers who are the pioneers of this field.
value .The Event is dataset name. The dataset name is TED_CAN_2018.csv
Cover letter
To Mr Majetic
CEO
Dear CEO
Displayed in this letter is a birds eye view of the various researches conducted on bid data using
the software that is titled SAP HANA. The research was conducted in a very extensive manner at
the TED (tenders electronic daily) investing company based out of Australia. We have included
in this report details about how the data was collected and used for the various findings. In the
report we also mention various recommendation given by the developers, programmers and
researchers who are the pioneers of this field.
9. Important results of the above analysis and Recommendations to CEO
The above analysis helps us understand the power and utility of the SAP HANA tool. The tool
allowed us to sort items in descending order that is from the highest to the lowest. The tool also
allowed us to sort the items and all data in ascending order that is from lowest to the highest. The
first and foremost principle in understanding datasets is the ability of the tool to sort data as we
wish to see it in order to give more clarity and purpose. This tool does that to perfection.
The tool helped us list down category frequencies which gives us insight into the data and helps
us understand the terms associative data and its categories. The datasets merge together and
provide results that could not have been comprehended otherwise.
When the experiment began, the tool was trained on the existing dataset. We are working with
the association rule which contains the various id’s and reference columns that help the system in
getting trained on the data in an associative manner. As explained above the parameters used are
Minimum support and Minimum confidence and maximum length. The number of rules found
value is zero. The total number of items found value is one. This was the very important insight
provided to us by the tool when it was trained on the data. After this the parameters were
extracted and the information was used to further the experiments. The information was used to
plot graphs which help us understand the dynamics of the data and how the data is skewed in the
favor of some parameters with respect to other parameters. This allows us to understand the
dynamics of our dataset and hence we can make informed decisions about it easily.
10. Conclusion
A big dataset is taken for investigation. Data cleansing is done. SAP HANA tool is used and advanced
data analysis is done. Findings are listed. Based on the findings few recommendations are made.
The above analysis helps us understand the power and utility of the SAP HANA tool. The tool
allowed us to sort items in descending order that is from the highest to the lowest. The tool also
allowed us to sort the items and all data in ascending order that is from lowest to the highest. The
first and foremost principle in understanding datasets is the ability of the tool to sort data as we
wish to see it in order to give more clarity and purpose. This tool does that to perfection.
The tool helped us list down category frequencies which gives us insight into the data and helps
us understand the terms associative data and its categories. The datasets merge together and
provide results that could not have been comprehended otherwise.
When the experiment began, the tool was trained on the existing dataset. We are working with
the association rule which contains the various id’s and reference columns that help the system in
getting trained on the data in an associative manner. As explained above the parameters used are
Minimum support and Minimum confidence and maximum length. The number of rules found
value is zero. The total number of items found value is one. This was the very important insight
provided to us by the tool when it was trained on the data. After this the parameters were
extracted and the information was used to further the experiments. The information was used to
plot graphs which help us understand the dynamics of the data and how the data is skewed in the
favor of some parameters with respect to other parameters. This allows us to understand the
dynamics of our dataset and hence we can make informed decisions about it easily.
10. Conclusion
A big dataset is taken for investigation. Data cleansing is done. SAP HANA tool is used and advanced
data analysis is done. Findings are listed. Based on the findings few recommendations are made.
References
Berthold, M., & Hand, D. (2011). Intelligent data analysis. Berlin: Springer.
Challenges with Big Data Analytics. (2015). International Journal Of Science And Research
(IJSR), 4(12), 778-780. doi: 10.21275/v4i12.nov152088
Cook, T. (2010). Data analysis & probability. San Diego, CA: Classroom Complete Press.
Eliot, G. (1902). The mill on the Floss. Toronto: G.N. Morang.
Gelman, A. Bayesian data analysis =.
Gjerstad, Ø. (2008). Mining for gene expression data by image analysis of DNA microarrays =. Å
s: Norwegian University of Life Sciences, Department of Chemistry, Biotechnology and
Food Science.
Graham, A. (2011). Statistics. London: Hodder Education.
Hair, J., Black, W., Babin, B., & Anderson, R. Multivariate data analysis.
Ohlhorst, F. (2013). Big data analytics. Hoboken, N.J.: Wiley.
Peck, R., & Devore, J. (2012). Statistics. Boston, MA: Brooks/Cole, Cengage Learning.
Ramsay, J., & Silverman, B. (2013). Functional Data Analysis. New York, NY: Springer.
SAP HANA server infrastructure with Power Systems. (2019). Retrieved from
https://www.ibm.com/it-infrastructure/power/sap-hana
SILVIA, P. (2016). SAP HANA. [Place of publication not identified]: SAP Press.
Sivia, D., & Skilling, J. (2012). Data analysis. Oxford: Oxford Univ. Press.
What is SAP HANA | In Memory Computing and Real Time Analytics. (2019). Retrieved from
https://www.sap.com/products/hana.html
Berthold, M., & Hand, D. (2011). Intelligent data analysis. Berlin: Springer.
Challenges with Big Data Analytics. (2015). International Journal Of Science And Research
(IJSR), 4(12), 778-780. doi: 10.21275/v4i12.nov152088
Cook, T. (2010). Data analysis & probability. San Diego, CA: Classroom Complete Press.
Eliot, G. (1902). The mill on the Floss. Toronto: G.N. Morang.
Gelman, A. Bayesian data analysis =.
Gjerstad, Ø. (2008). Mining for gene expression data by image analysis of DNA microarrays =. Å
s: Norwegian University of Life Sciences, Department of Chemistry, Biotechnology and
Food Science.
Graham, A. (2011). Statistics. London: Hodder Education.
Hair, J., Black, W., Babin, B., & Anderson, R. Multivariate data analysis.
Ohlhorst, F. (2013). Big data analytics. Hoboken, N.J.: Wiley.
Peck, R., & Devore, J. (2012). Statistics. Boston, MA: Brooks/Cole, Cengage Learning.
Ramsay, J., & Silverman, B. (2013). Functional Data Analysis. New York, NY: Springer.
SAP HANA server infrastructure with Power Systems. (2019). Retrieved from
https://www.ibm.com/it-infrastructure/power/sap-hana
SILVIA, P. (2016). SAP HANA. [Place of publication not identified]: SAP Press.
Sivia, D., & Skilling, J. (2012). Data analysis. Oxford: Oxford Univ. Press.
What is SAP HANA | In Memory Computing and Real Time Analytics. (2019). Retrieved from
https://www.sap.com/products/hana.html
1 out of 22
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.