Data Handling and Business Intelligence: Superstore and WEKA Analysis
VerifiedAdded on 2023/01/11
|18
|3844
|47
Report
AI Summary
This report provides a comprehensive analysis of data handling and business intelligence, focusing on the application of data mining techniques to a superstore dataset. The report begins with an introduction to data mining and business intelligence, emphasizing their importance in modern organizations. Part 1 delves into the analysis of superstore data using Excel, demonstrating the practical use of functions like Lookup and Pivot tables to analyze profit and sales trends over several years. Data visualization through graphs and charts is also demonstrated. Part 2 shifts to the use of WEKA, showcasing clustering analysis using an audidealership.csv file. The report explains commonly used data mining methods such as association, classification, and clustering analysis with real-time examples. It concludes with a discussion of the advantages and disadvantages of WEKA. Overall, the report offers a practical guide to data analysis and business intelligence, making it a valuable resource for students on Desklib.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.

DATA HANDLING AND
BUSINESS INTELLIGENCE
BUSINESS INTELLIGENCE
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

TABLE OF CONTENTS
INTRODUCTION...........................................................................................................................1
PART 1............................................................................................................................................1
By using data set of superstores analyse profit and sales over years and analyse it by using
Excel for pre- processing of data, also analyse and visualize the data........................................1
Demonstration of ways in which data can be practically analysed using Excel functions such
as Lookup, Pivot table, graphs and charts...................................................................................4
PART 2............................................................................................................................................8
By using audidealership.csv file show conjunction with Weka with the example of clustering.8
Explanation of commonly used data mining methods that can be used in business. Explain
them with real time example......................................................................................................11
Advantages and disadvantages of Weka....................................................................................14
CONCLUSION..............................................................................................................................15
REREFENCES..............................................................................................................................16
INTRODUCTION...........................................................................................................................1
PART 1............................................................................................................................................1
By using data set of superstores analyse profit and sales over years and analyse it by using
Excel for pre- processing of data, also analyse and visualize the data........................................1
Demonstration of ways in which data can be practically analysed using Excel functions such
as Lookup, Pivot table, graphs and charts...................................................................................4
PART 2............................................................................................................................................8
By using audidealership.csv file show conjunction with Weka with the example of clustering.8
Explanation of commonly used data mining methods that can be used in business. Explain
them with real time example......................................................................................................11
Advantages and disadvantages of Weka....................................................................................14
CONCLUSION..............................................................................................................................15
REREFENCES..............................................................................................................................16

INTRODUCTION
Data Mining can be defined as a process through which pattern within large data sets are
discovered. It involves various kinds of methods at machine learning intersection, database
systems and statistics (Homocianu and Airinei, 2017). It can also be defined as a practise of
examining large number of pre- existing databases so that new information can be generated.
Today many organizations focus on data mining for data handing and extraction of new and
important data. For this business uses business Intelligence so that specific data can be identified
that can further be used for taking effective decisions. Business Intelligence can be defined as a
set of process, technologies, architecture that helps in converting raw data into meaningful data
which is fruitful in driving overall profitability of business. There are various kinds of software’s
that can be used by organizations that can be used for BI and transform data into actionable
knowledge and intelligence. It is one of the most important for retail sector organizations as it
helps them to analyse large volume of data and take appropriate and effective decisions so that
they can enhance their relationship with their customers and increase their overall profitability.
This assignment will lay emphasis on Analysis of superstore and audileadership data so that the
given data can be analysed and overall sales and profit of the organization can be identified other
than this different kinds of data mining methods will be explained with advantages and
disadvantage of Weka software will be discussed in this assignment.
PART 1
By using data set of superstores analyse profit and sales over years and analyse it by using Excel
for pre- processing of data, also analyse and visualize the data
There are various kinds of techniques, methods and formulas in Excel that can be used
for analysing and calculating profit and sales over years. It helps in evaluation of data so that
organizations can get a brief idea of their average profit and overall sales in last 4 to 5 years so
that this data can be used for further important decision making (Moro and et. al., 2020). Excel is
one of the most common software that can be used by organizations for calculations and analysis
of financial calculations, forecasting data and for various other purposes. Excel has various kinds
of inbuilt formulas that can be used for analysing the data and reaching to a conclusion. The
main and primary function of Excel is to organize all the information or data of the organization
in an appropriate manner. It is important to organize the data if formulas, techniques or methods
are required to be used for further analysis. Excel also provides an option of generating graphs or
1
Data Mining can be defined as a process through which pattern within large data sets are
discovered. It involves various kinds of methods at machine learning intersection, database
systems and statistics (Homocianu and Airinei, 2017). It can also be defined as a practise of
examining large number of pre- existing databases so that new information can be generated.
Today many organizations focus on data mining for data handing and extraction of new and
important data. For this business uses business Intelligence so that specific data can be identified
that can further be used for taking effective decisions. Business Intelligence can be defined as a
set of process, technologies, architecture that helps in converting raw data into meaningful data
which is fruitful in driving overall profitability of business. There are various kinds of software’s
that can be used by organizations that can be used for BI and transform data into actionable
knowledge and intelligence. It is one of the most important for retail sector organizations as it
helps them to analyse large volume of data and take appropriate and effective decisions so that
they can enhance their relationship with their customers and increase their overall profitability.
This assignment will lay emphasis on Analysis of superstore and audileadership data so that the
given data can be analysed and overall sales and profit of the organization can be identified other
than this different kinds of data mining methods will be explained with advantages and
disadvantage of Weka software will be discussed in this assignment.
PART 1
By using data set of superstores analyse profit and sales over years and analyse it by using Excel
for pre- processing of data, also analyse and visualize the data
There are various kinds of techniques, methods and formulas in Excel that can be used
for analysing and calculating profit and sales over years. It helps in evaluation of data so that
organizations can get a brief idea of their average profit and overall sales in last 4 to 5 years so
that this data can be used for further important decision making (Moro and et. al., 2020). Excel is
one of the most common software that can be used by organizations for calculations and analysis
of financial calculations, forecasting data and for various other purposes. Excel has various kinds
of inbuilt formulas that can be used for analysing the data and reaching to a conclusion. The
main and primary function of Excel is to organize all the information or data of the organization
in an appropriate manner. It is important to organize the data if formulas, techniques or methods
are required to be used for further analysis. Excel also provides an option of generating graphs or
1

charts so that it becomes much easier for organizations to visualize their data and take
appropriate decisions accordingly. In order to analyse sales and profit of the organization over
years pivot table and graph method can be used.
Row Labels Sum of Sales
Average of
Profit
Furniture 5178590.542 68.11660673
2009 1472671.724 137.9565402
2010 1252518.416 21.35772727
2011 1268656.078 120.6278708
2012 1184744.324 -10.02715311
Office Supplies 3752762.1 112.3690738
2009 1035399.64 151.9643028
2010 910359.95 100.9771282
2011 796383.79 78.20144784
2012 1010618.72 116.7143313
Technology 5984248.182 429.2075157
2009 1701825.482 359.7878373
2010 1397208.679 447.037081
2011 1364905.113 519.0770085
2012 1520308.909 402.5972762
Grand Total 14915600.82 181.1844243
Figure 1 Average profit and sum of sales of last 4 years.
Data interpretation: Above graph has been made with the help of pivot table feature of Excel
with the help of which Average profit in last 4 years and overall sales in past 4 years have been
2
appropriate decisions accordingly. In order to analyse sales and profit of the organization over
years pivot table and graph method can be used.
Row Labels Sum of Sales
Average of
Profit
Furniture 5178590.542 68.11660673
2009 1472671.724 137.9565402
2010 1252518.416 21.35772727
2011 1268656.078 120.6278708
2012 1184744.324 -10.02715311
Office Supplies 3752762.1 112.3690738
2009 1035399.64 151.9643028
2010 910359.95 100.9771282
2011 796383.79 78.20144784
2012 1010618.72 116.7143313
Technology 5984248.182 429.2075157
2009 1701825.482 359.7878373
2010 1397208.679 447.037081
2011 1364905.113 519.0770085
2012 1520308.909 402.5972762
Grand Total 14915600.82 181.1844243
Figure 1 Average profit and sum of sales of last 4 years.
Data interpretation: Above graph has been made with the help of pivot table feature of Excel
with the help of which Average profit in last 4 years and overall sales in past 4 years have been
2
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

calculated. From the above graph it is clear that technology provides maximum profit to the
company and their sales were highest in Technology in 2009.
Figure 2 Average profit in last 4 years
Data Interpretation: From the above graphs overall average profit of the company in past four
years have been explained. This graph clearly explains highest average profit of the company
was seen in technology department in 2011.
Technology: average sales and average profit from year 2009 to year 2012
From the above data interpretation, it is clear that sales and profit of technology
department in last four years is maximum.
Row Labels
Average of
Sales
Average of
Profit
Technology 2897.941008 429.2075157
2009 3145.703293 359.7878373
2010 2631.278114 447.037081
2011 2916.463917 519.0770085
2012 2895.826493 402.5972762
Grand Total 2897.941008 429.2075157
3
company and their sales were highest in Technology in 2009.
Figure 2 Average profit in last 4 years
Data Interpretation: From the above graphs overall average profit of the company in past four
years have been explained. This graph clearly explains highest average profit of the company
was seen in technology department in 2011.
Technology: average sales and average profit from year 2009 to year 2012
From the above data interpretation, it is clear that sales and profit of technology
department in last four years is maximum.
Row Labels
Average of
Sales
Average of
Profit
Technology 2897.941008 429.2075157
2009 3145.703293 359.7878373
2010 2631.278114 447.037081
2011 2916.463917 519.0770085
2012 2895.826493 402.5972762
Grand Total 2897.941008 429.2075157
3

Figure 3 Average profit and sales of Technology
Data Interpretation: From the above graph it is clear that average sales of technology were
highest in year 2009 Whereas average profit was highest in year 2011.
Demonstration of ways in which data can be practically analysed using Excel functions such as
Lookup, Pivot table, graphs and charts
Lookup Function: There are various kinds of functions and formulas available in Excel that can
be used analysis of data. Lookup is one of those functions, this function is mostly used to select
values from different range (Fylstra, 2017). It is also used to select a particular type of data from
a large amount of data set. This formula is applied to the data which is assembled or stored in
either ascending or descending order. This function helps in returning a value from a defined
range of data or from an array. It helps in searching a value in other columns, if the value
matches the value in another column then it directly returns the value (Wang, Luo and Liu,
2016). So, it can be said that it is used for searching for a particular value if defined rows or
columns. Default behaviour of lookup helps in solving any kinds of problems within Excel.
Syntax of lookup function is:
LOOKUP (value, loopup_range, [result_range])
If we want to identify how much sales order id = 483 generated then for this lookup
function can be used
=LOOKUP(483,B2:B8400,F2:F8400)
4
Data Interpretation: From the above graph it is clear that average sales of technology were
highest in year 2009 Whereas average profit was highest in year 2011.
Demonstration of ways in which data can be practically analysed using Excel functions such as
Lookup, Pivot table, graphs and charts
Lookup Function: There are various kinds of functions and formulas available in Excel that can
be used analysis of data. Lookup is one of those functions, this function is mostly used to select
values from different range (Fylstra, 2017). It is also used to select a particular type of data from
a large amount of data set. This formula is applied to the data which is assembled or stored in
either ascending or descending order. This function helps in returning a value from a defined
range of data or from an array. It helps in searching a value in other columns, if the value
matches the value in another column then it directly returns the value (Wang, Luo and Liu,
2016). So, it can be said that it is used for searching for a particular value if defined rows or
columns. Default behaviour of lookup helps in solving any kinds of problems within Excel.
Syntax of lookup function is:
LOOKUP (value, loopup_range, [result_range])
If we want to identify how much sales order id = 483 generated then for this lookup
function can be used
=LOOKUP(483,B2:B8400,F2:F8400)
4

=4965.76
Pivot Table: It is one of the most powerful tools of MS Excel as it can be used for analysing,
calculating and summarizing data and also allows users to compare it with other data. It helps the
users to extract data in a proper manner. It is mostly used for calculating average, sum, extracting
data of particular region and for many other purposes.
Calculation of shipping cost, product base margin and overall sales of furniture, office
supplies and technology
Row Labels
Sum of Product Base
Margin
Average of
Sales
Sum of Shipping
Cost
Furniture 1006.77 3003.82282 53243.69
Bookcases 122.09 4352.656296 8646.07
Chairs & Chairmats 228.46 4564.343394 15512.69
Office Furnishings 414.31 885.9058503 8402.72
Tables 241.91 5252.100116 20682.21
Office Supplies 2116.77 814.0481779 36095.51
Appliances 240.64 1698.137189 6854.11
Binders and Binder Accessories 342.42 1117.986437 6633.52
Envelopes 91.97 707.6658537 1682.77
Labels 108.61 135.3526042 288.66
Paper 458.85 364.4513143 7914.41
Pens & Art Supplies 337.76 263.9924487 2041.81
Rubber Bands 95.14 83.83592179 225.21
Scissors, Rulers and Trimmers 92.23 562.474375 670.51
Storage & Organization 349.15 1960.041392 9784.51
Technology 1148.77 2897.941008 18491.84
Computer Peripherals 449.67 1049.968259 4067.34
Copiers and Fax 36.37 12992.65862 2446.88
Office Machines 149.75 6435.303086 7135.91
Telephones and Communication 512.98 2139.65323 4841.71
Grand Total 4272.31 1775.878179 107831.04
5
Pivot Table: It is one of the most powerful tools of MS Excel as it can be used for analysing,
calculating and summarizing data and also allows users to compare it with other data. It helps the
users to extract data in a proper manner. It is mostly used for calculating average, sum, extracting
data of particular region and for many other purposes.
Calculation of shipping cost, product base margin and overall sales of furniture, office
supplies and technology
Row Labels
Sum of Product Base
Margin
Average of
Sales
Sum of Shipping
Cost
Furniture 1006.77 3003.82282 53243.69
Bookcases 122.09 4352.656296 8646.07
Chairs & Chairmats 228.46 4564.343394 15512.69
Office Furnishings 414.31 885.9058503 8402.72
Tables 241.91 5252.100116 20682.21
Office Supplies 2116.77 814.0481779 36095.51
Appliances 240.64 1698.137189 6854.11
Binders and Binder Accessories 342.42 1117.986437 6633.52
Envelopes 91.97 707.6658537 1682.77
Labels 108.61 135.3526042 288.66
Paper 458.85 364.4513143 7914.41
Pens & Art Supplies 337.76 263.9924487 2041.81
Rubber Bands 95.14 83.83592179 225.21
Scissors, Rulers and Trimmers 92.23 562.474375 670.51
Storage & Organization 349.15 1960.041392 9784.51
Technology 1148.77 2897.941008 18491.84
Computer Peripherals 449.67 1049.968259 4067.34
Copiers and Fax 36.37 12992.65862 2446.88
Office Machines 149.75 6435.303086 7135.91
Telephones and Communication 512.98 2139.65323 4841.71
Grand Total 4272.31 1775.878179 107831.04
5
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Figure 4 product base margin and shipping cost
Calculation of profit region wise
Row Labels
Sum of
Profit
Atlantic 238960.66
North Carolina 2841.11
Northwest
Territories 8307.05
Ontario 439214.57
Prarie 321160.12
Quebec 140426.65
West 297008.61
Yukon 73849.21
Grand Total 1521767.98
6
Calculation of profit region wise
Row Labels
Sum of
Profit
Atlantic 238960.66
North Carolina 2841.11
Northwest
Territories 8307.05
Ontario 439214.57
Prarie 321160.12
Quebec 140426.65
West 297008.61
Yukon 73849.21
Grand Total 1521767.98
6

Figure 5 profit region wise
Total order and average profit region wise
Row Labels
Count of Order
ID
Average of
Profit
Atlantic 1080 221.2598704
New Brunswick 323 357.1267492
Newfoundland 82 83.96512195
Nova Scotia 464 183.9695474
Prince Edward Island 211 148.6336967
North Carolina 79 35.96341772
Elon 79 35.96341772
Northwest Territories 59 140.7974576
Northwest Territories 59 140.7974576
Ontario 2161 203.2459833
Georgina 129 209.2537984
Hanover 624 193.4693109
Ontario 739 168.9889851
Orangeville 334 222.3565569
Waterloo 335 275.659791
Prarie 1706 188.2532943
Manitoba 793 172.0392938
Saskachewan 913 202.3362103
Quebec 781 179.8036492
Quebec 781 179.8036492
West 1991 149.1755952
Alberta 865 175.6606705
7
Total order and average profit region wise
Row Labels
Count of Order
ID
Average of
Profit
Atlantic 1080 221.2598704
New Brunswick 323 357.1267492
Newfoundland 82 83.96512195
Nova Scotia 464 183.9695474
Prince Edward Island 211 148.6336967
North Carolina 79 35.96341772
Elon 79 35.96341772
Northwest Territories 59 140.7974576
Northwest Territories 59 140.7974576
Ontario 2161 203.2459833
Georgina 129 209.2537984
Hanover 624 193.4693109
Ontario 739 168.9889851
Orangeville 334 222.3565569
Waterloo 335 275.659791
Prarie 1706 188.2532943
Manitoba 793 172.0392938
Saskachewan 913 202.3362103
Quebec 781 179.8036492
Quebec 781 179.8036492
West 1991 149.1755952
Alberta 865 175.6606705
7

British Columbia 1126 128.8296004
Yukon 542 136.253155
Dawson 291 201.9967354
Whitehorse 251 60.03250996
Grand Total 8399 181.1844243
Figure 6 Total order and average profit region wise
PART 2
By using audidealership.csv file show conjunction with Weka with the example of clustering
Figure 7 audidealership data relationship
8
Yukon 542 136.253155
Dawson 291 201.9967354
Whitehorse 251 60.03250996
Grand Total 8399 181.1844243
Figure 6 Total order and average profit region wise
PART 2
By using audidealership.csv file show conjunction with Weka with the example of clustering
Figure 7 audidealership data relationship
8
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Figure 8 EM cluster-1
9
9

Figure 9 Simple K-means cluster
10
10

Explanation of commonly used data mining methods that can be used in business. Explain them
with real time example
Data mining is one of the most effective and useful way through which organizations can
analyse their raw data and extract useful and important information that can be used by them to
take effective decisions (Ibrahim and Shiba, 2019). There are various kinds of data mining
methods or techniques that can be used by organizations for evaluation or analysis of data and
can be used for different purposes that are fruitful for the organization in order to enhance their
11
with real time example
Data mining is one of the most effective and useful way through which organizations can
analyse their raw data and extract useful and important information that can be used by them to
take effective decisions (Ibrahim and Shiba, 2019). There are various kinds of data mining
methods or techniques that can be used by organizations for evaluation or analysis of data and
can be used for different purposes that are fruitful for the organization in order to enhance their
11
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

profitability. These methods can be applied to any amount of data as per the need and
requirement of the organization. Different sectors require different data mining methods for
analysis of different kinds of information but mostly it is used for analysis of profit, sales,
customer data and extracting any other kind of information for decision making. There are many
kinds of data mining methods but association, Classification, clustering analysis, prediction,
pattern tracking or sequential pattern, decision trees, neural network and Outlier Analysis or
Anomaly Analysis are most commonly used data mining methods that can be used by
organizations. All the specified methods will be explained with the help of real time examples:
Association: It is one of those data mining method which is mostly used by organizations in
order to find correlation between different items. it is one of the most important data mining
method that can be used by business organizations in order to find hidden pattern within the
chosen items so that relationship between the chosen items can be found or identified so that
respective decision can be taken accordingly (Russell and Markov, 2017). Retail sector
organizations are one of those sectors that majorly uses this method for different purposes such
as predict their customers buying behaviour, pattern or for identification of products or services
that are in demand within customers. For example: a super market wants to analyse their
customers data so that they can identify which item is sold mostly and what combination of
products are mostly purchased by customers. With the help of this method supermarket can
analyse that 65% of customer purchase eggs then they also purchase milk as well and
approximately 13% of customers purchase both milk and eggs.
Classification: It is another commonly used data mining method which is used by organizations
so that they can distinguish between different items. this method is used so that organizations can
differentiate between items in order to be categorized on their basis of their features, behaviour
etc. and can be categorized within groups (Sehgal and Bhargava, 2018). With the help of this
methods organizations can segregate their data category wise and take decisions accordingly by
applying some rules. This method can be used by banks, supermarkets and many other kinds of
industrial sectors. For example, is a super market wants to categorize their products on the basis
of their features then they can use this feature. Bank can also use this feature in order to
categorize their loans on the basis of risk associated with it (high, medium and low risk).
Clustering Analysis: Many times, people get confused between classification and cluster
analysis but there is huge different between both of them. All the clusters are developed for
12
requirement of the organization. Different sectors require different data mining methods for
analysis of different kinds of information but mostly it is used for analysis of profit, sales,
customer data and extracting any other kind of information for decision making. There are many
kinds of data mining methods but association, Classification, clustering analysis, prediction,
pattern tracking or sequential pattern, decision trees, neural network and Outlier Analysis or
Anomaly Analysis are most commonly used data mining methods that can be used by
organizations. All the specified methods will be explained with the help of real time examples:
Association: It is one of those data mining method which is mostly used by organizations in
order to find correlation between different items. it is one of the most important data mining
method that can be used by business organizations in order to find hidden pattern within the
chosen items so that relationship between the chosen items can be found or identified so that
respective decision can be taken accordingly (Russell and Markov, 2017). Retail sector
organizations are one of those sectors that majorly uses this method for different purposes such
as predict their customers buying behaviour, pattern or for identification of products or services
that are in demand within customers. For example: a super market wants to analyse their
customers data so that they can identify which item is sold mostly and what combination of
products are mostly purchased by customers. With the help of this method supermarket can
analyse that 65% of customer purchase eggs then they also purchase milk as well and
approximately 13% of customers purchase both milk and eggs.
Classification: It is another commonly used data mining method which is used by organizations
so that they can distinguish between different items. this method is used so that organizations can
differentiate between items in order to be categorized on their basis of their features, behaviour
etc. and can be categorized within groups (Sehgal and Bhargava, 2018). With the help of this
methods organizations can segregate their data category wise and take decisions accordingly by
applying some rules. This method can be used by banks, supermarkets and many other kinds of
industrial sectors. For example, is a super market wants to categorize their products on the basis
of their features then they can use this feature. Bank can also use this feature in order to
categorize their loans on the basis of risk associated with it (high, medium and low risk).
Clustering Analysis: Many times, people get confused between classification and cluster
analysis but there is huge different between both of them. All the clusters are developed for
12

different objectives and are made on the basis of their dependency on other similar items. As per
the similarities and dissimilarities data sets are partitioned into small segments because of which
sometimes it is also known as data segmentation (Hussain and et. al., 2018). Clustering analysis
is further divided into sub methods that can be used by organizations as per their needs and
requirements. It is mostly used by organizations for decision making or for development of
strategies in order to increase their sales and profitability. For example: A automotive
organization can be use clustering analysis in order to segregate their customers who have taken
loan for purchase of vehicle and customers who have completely paid for the purchase of the
vehicle or also to analyse whether customers who have purchased vehicles are capable for
completely paying whole price of the vehicle or not by checking their salaries and amount of
loan they have taken.
Prediction: It is another commonly used data mining method which is used by organizations for
analysing past and present data sets so that prediction for future can be made (Homocianu and
Airinei, 2017). For this more than two data mining methods are clubbed together so that analysis
of data can be done. There are four methods that can be used together for predictions such as:
trend analysis, pattern matching, relation analysis and classification. It is majorly used by retail
sector organizations so that they can analyse their past and current data an order to predict future
sales. It can also be used by retail sector organizations for predictability of their overall revenue
and profit.
Sequential Pattern or Pattern Tracking: this method of data mining is used by organizations for
identification of data patterns within a fixed interval of time so that needful and required data can
be obtained for taking appropriate and effective decisions (Jain, Sharma and Sharma, 2017). For
example: a supermarket can use this method in order to identify when sales of which is product
is highest and at what time of, they year. So that n the basis of this data they can take required
decisions such as increase availability of that product within that time of year in order to increase
their sales and revenue.
Decision Trees: it is another kind of data mining method which helps organizations to classify
their data in different categories so that effective and appropriate decision can be taken. This
method can be used by any industry who requires to categorise their data for decision making.
For example: if government organizations want to check what number of citizens helps are
13
the similarities and dissimilarities data sets are partitioned into small segments because of which
sometimes it is also known as data segmentation (Hussain and et. al., 2018). Clustering analysis
is further divided into sub methods that can be used by organizations as per their needs and
requirements. It is mostly used by organizations for decision making or for development of
strategies in order to increase their sales and profitability. For example: A automotive
organization can be use clustering analysis in order to segregate their customers who have taken
loan for purchase of vehicle and customers who have completely paid for the purchase of the
vehicle or also to analyse whether customers who have purchased vehicles are capable for
completely paying whole price of the vehicle or not by checking their salaries and amount of
loan they have taken.
Prediction: It is another commonly used data mining method which is used by organizations for
analysing past and present data sets so that prediction for future can be made (Homocianu and
Airinei, 2017). For this more than two data mining methods are clubbed together so that analysis
of data can be done. There are four methods that can be used together for predictions such as:
trend analysis, pattern matching, relation analysis and classification. It is majorly used by retail
sector organizations so that they can analyse their past and current data an order to predict future
sales. It can also be used by retail sector organizations for predictability of their overall revenue
and profit.
Sequential Pattern or Pattern Tracking: this method of data mining is used by organizations for
identification of data patterns within a fixed interval of time so that needful and required data can
be obtained for taking appropriate and effective decisions (Jain, Sharma and Sharma, 2017). For
example: a supermarket can use this method in order to identify when sales of which is product
is highest and at what time of, they year. So that n the basis of this data they can take required
decisions such as increase availability of that product within that time of year in order to increase
their sales and revenue.
Decision Trees: it is another kind of data mining method which helps organizations to classify
their data in different categories so that effective and appropriate decision can be taken. This
method can be used by any industry who requires to categorise their data for decision making.
For example: if government organizations want to check what number of citizens helps are
13

eligible for giving their vote, how many of them are male, female. This method can be used by
government so that they can categorize their citizen on the basis of abo e criteria.
Neural Network: if organizations what to check relationship between their input and output then
this method is used (Homocianu and Airinei, 2017). It is completely based upon biological
neural network and is used for data processing, regression etc. It can be used by organization so
that they can recognize patterns within their input and output so that they can take decisions
accordingly and bring changes within their input in order to enhance their output. It is used by
hospitals for analysis of their data such as amount of drug provided to patient and their recovery
rate.
Anomaly Analysis or Outlier Analysis: this method is used by business only when they need to
identify data items that do not begave in an expected pattern or do not comply with expected
behaviour (Jain, Sharma and Sharma, 2017). It is mostly used by organizations if they want to
identify faults, frauds or issues within their current system or pattern such as: it can be used by
An IT organization to identify unauthorized access to their data or is any of their data been
hacked or not.
Advantages and disadvantages of Weka
Advantages of Weka
Weka is a tool which is used for data classification, pre-processing, clustering, regression,
visualization and association rules. It has various kinds of advantages as compared to Excel. It is
a well-suited software that can be used for developing new machine learning schemes. There are
many other advantages that Weka has over excel such as: it is a freely available software under
GNU general Public license (Ibrahim and Shiba, 2019). It is portable as it is completely
implemented in java programming language and can be used or run on any kind of modern
computing platform. It is a comprehensive collection of data processing and modern techniques.
But most importantly it is quite easy to use because of its graphical user interface. There are
many other kinds of data formats that are supported by Weka as compared to Excel, such as:
ARFF, C4.5, CSV and binary. It can easily be integrated with other java packages whereas
Excel faces various kinds of difficulties when it comes to its integration with java packages.
Disadvantages of Weka
Weka has many disadvantages as well, such as: one of the main disadvantages is that it
can only handle small data sets. If the data set is larger than few MB, OutOfMemory error occurs
14
government so that they can categorize their citizen on the basis of abo e criteria.
Neural Network: if organizations what to check relationship between their input and output then
this method is used (Homocianu and Airinei, 2017). It is completely based upon biological
neural network and is used for data processing, regression etc. It can be used by organization so
that they can recognize patterns within their input and output so that they can take decisions
accordingly and bring changes within their input in order to enhance their output. It is used by
hospitals for analysis of their data such as amount of drug provided to patient and their recovery
rate.
Anomaly Analysis or Outlier Analysis: this method is used by business only when they need to
identify data items that do not begave in an expected pattern or do not comply with expected
behaviour (Jain, Sharma and Sharma, 2017). It is mostly used by organizations if they want to
identify faults, frauds or issues within their current system or pattern such as: it can be used by
An IT organization to identify unauthorized access to their data or is any of their data been
hacked or not.
Advantages and disadvantages of Weka
Advantages of Weka
Weka is a tool which is used for data classification, pre-processing, clustering, regression,
visualization and association rules. It has various kinds of advantages as compared to Excel. It is
a well-suited software that can be used for developing new machine learning schemes. There are
many other advantages that Weka has over excel such as: it is a freely available software under
GNU general Public license (Ibrahim and Shiba, 2019). It is portable as it is completely
implemented in java programming language and can be used or run on any kind of modern
computing platform. It is a comprehensive collection of data processing and modern techniques.
But most importantly it is quite easy to use because of its graphical user interface. There are
many other kinds of data formats that are supported by Weka as compared to Excel, such as:
ARFF, C4.5, CSV and binary. It can easily be integrated with other java packages whereas
Excel faces various kinds of difficulties when it comes to its integration with java packages.
Disadvantages of Weka
Weka has many disadvantages as well, such as: one of the main disadvantages is that it
can only handle small data sets. If the data set is larger than few MB, OutOfMemory error occurs
14
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

and if it is important to work on the data set then there will be a requirement of reducing overall
size of the dataset. So that further work on the data set can be done and completed within a
particular time period. It also lacks proper documentation (Russell and Markov, 2017). Not only
this, in this system is updated constantly whereas in Excel regular updating are not required. Its
connectivity with Excel and other non-java packages is not good. It also does not have a facility
through which parameters for scaling can be saved for future datasets. One of the main
disadvantages of Weka is that it does not have all data visualization and preparation techniques
because of which not all kinds of operations can be performed on it.
CONCLUSION
From the above assignment it has been summarized that data mining is one of the most
useful tools that can be used by organizations that can be used for finding patterns, relationship
between input and output, and for taking appropriate decisions from a large data set. It has been
analysed that with the help of data mining organizations can also analyse their current and past
data so that on the basis of this, they can predict their future sales and take decision in an
appropriate manner. It can also be used by organizations in order to enhance their relationship
with their customers. there are various kinds of tools that can be used for fata mining. Excel is
one of those tools that can be used by organizations for data mining, for analysis of current data,
extraction of important and useful data for decision making. Weka is another data mining
software that can be used for data handing and analysing sales data so for prediction of future
results.
15
size of the dataset. So that further work on the data set can be done and completed within a
particular time period. It also lacks proper documentation (Russell and Markov, 2017). Not only
this, in this system is updated constantly whereas in Excel regular updating are not required. Its
connectivity with Excel and other non-java packages is not good. It also does not have a facility
through which parameters for scaling can be saved for future datasets. One of the main
disadvantages of Weka is that it does not have all data visualization and preparation techniques
because of which not all kinds of operations can be performed on it.
CONCLUSION
From the above assignment it has been summarized that data mining is one of the most
useful tools that can be used by organizations that can be used for finding patterns, relationship
between input and output, and for taking appropriate decisions from a large data set. It has been
analysed that with the help of data mining organizations can also analyse their current and past
data so that on the basis of this, they can predict their future sales and take decision in an
appropriate manner. It can also be used by organizations in order to enhance their relationship
with their customers. there are various kinds of tools that can be used for fata mining. Excel is
one of those tools that can be used by organizations for data mining, for analysis of current data,
extraction of important and useful data for decision making. Weka is another data mining
software that can be used for data handing and analysing sales data so for prediction of future
results.
15

REREFENCES
Books and Journals
Ibrahim, F.A. and Shiba, O.A., 2019. Data Mining: WEKA Software (an Overview). Journal of
Pure and Applied Sciences. 18(3).
Hussain, S., and et. al., 2018. Educational data mining and analysis of students’ academic
performance using WEKA. Indonesian Journal of Electrical Engineering and Computer
Science. 9(2). pp.447-459.
Jain, A., Sharma, V. and Sharma, V., 2017. Big data mining using supervised machine learning
approaches for Hadoop with Weka distribution. International Journal of Computational
Intelligence Research. 13(8). pp.2095-111.
Homocianu, D. and Airinei, D., 2017. The Excel Data Mining Add-in. Applications in audit and
financial reports. The Audit Financiar journal. 15(147). pp.451-451.
Moro, S., and et. al., 2020. Unfolding the Drivers of Student Success in Answering Multiple-
Choice Questions About Microsoft Excel. Computers in the Schools, pp.1-19.
Sehgal, M. and Bhargava, D., 2018. Knowledge mining: An approach using comparison of data
cleansing tools. Journal of Information and Optimization Sciences. 39(1). pp.337-343.
Homocianu, D. and Airinei, D., 2017. The Excel Data Mining Add-In. Applications in Audit and
Financial Reports (Componenta Excel Data Mining. Aplicații in audit și raportări
financiare). Audit Financiar. 15(3). p.147.
Wang, P., Luo, H. and Liu, J., 2016, May. Format-preserving encryption for Excel. In 2016 IEEE
International Conference on Consumer Electronics-Taiwan (ICCE-TW) (pp. 1-2).
IEEE.
Russell, I. and Markov, Z., 2017, March. An introduction to the Weka data mining system.
In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science
Education (pp. 742-742).
Fylstra, D.H., 2017, December. Simulation models in Excel, Tableau, power BI and mobile apps
with analytic solver® software. In 2017 Winter Simulation Conference (WSC) (pp.
4422-4422). IEEE.
16
Books and Journals
Ibrahim, F.A. and Shiba, O.A., 2019. Data Mining: WEKA Software (an Overview). Journal of
Pure and Applied Sciences. 18(3).
Hussain, S., and et. al., 2018. Educational data mining and analysis of students’ academic
performance using WEKA. Indonesian Journal of Electrical Engineering and Computer
Science. 9(2). pp.447-459.
Jain, A., Sharma, V. and Sharma, V., 2017. Big data mining using supervised machine learning
approaches for Hadoop with Weka distribution. International Journal of Computational
Intelligence Research. 13(8). pp.2095-111.
Homocianu, D. and Airinei, D., 2017. The Excel Data Mining Add-in. Applications in audit and
financial reports. The Audit Financiar journal. 15(147). pp.451-451.
Moro, S., and et. al., 2020. Unfolding the Drivers of Student Success in Answering Multiple-
Choice Questions About Microsoft Excel. Computers in the Schools, pp.1-19.
Sehgal, M. and Bhargava, D., 2018. Knowledge mining: An approach using comparison of data
cleansing tools. Journal of Information and Optimization Sciences. 39(1). pp.337-343.
Homocianu, D. and Airinei, D., 2017. The Excel Data Mining Add-In. Applications in Audit and
Financial Reports (Componenta Excel Data Mining. Aplicații in audit și raportări
financiare). Audit Financiar. 15(3). p.147.
Wang, P., Luo, H. and Liu, J., 2016, May. Format-preserving encryption for Excel. In 2016 IEEE
International Conference on Consumer Electronics-Taiwan (ICCE-TW) (pp. 1-2).
IEEE.
Russell, I. and Markov, Z., 2017, March. An introduction to the Weka data mining system.
In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science
Education (pp. 742-742).
Fylstra, D.H., 2017, December. Simulation models in Excel, Tableau, power BI and mobile apps
with analytic solver® software. In 2017 Winter Simulation Conference (WSC) (pp.
4422-4422). IEEE.
16
1 out of 18
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.