This assignment focuses on data handling and business intelligence. It covers the analysis of profit and sales using Excel, explores various data mining methods, and discusses the importance of business intelligence. Real-time examples and techniques such as lookup, pivot tables, graphs, and charts are also demonstrated.
Contribute Materials
Your contribution can guide someoneās learning journey. Share your
documents today.
DATA HANDLING AND BUSINESS INTELLIGENCE
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
TABLE OF CONTENTS INTRODUCTION...........................................................................................................................1 PART 1............................................................................................................................................1 By using data set of superstores analyse profit and sales over years and analyse it by using Excel for pre- processing of data, also analyse and visualize the data........................................1 Demonstration of ways in which data can be practically analysed using Excel functions such as Lookup, Pivot table, graphs and charts...................................................................................4 PART 2............................................................................................................................................8 By using audidealership.csv file show conjunction with Weka with the example of clustering.8 Explanation of commonly used data mining methods that can be used in business. Explain them with real time example......................................................................................................11 Advantages and disadvantages of Weka....................................................................................14 CONCLUSION..............................................................................................................................15 REREFENCES..............................................................................................................................16
INTRODUCTION Data Mining can be defined as a process through which pattern within large data sets are discovered. It involves various kinds of methods at machine learning intersection, database systems and statistics(Homocianu and Airinei, 2017). It can also be defined as a practise of examining large number of pre- existing databases so that new information can be generated. Today many organizations focus on data mining for data handing and extraction of new and important data. For this business uses business Intelligence so that specific data can be identified that can further be used for taking effective decisions. Business Intelligence can be defined as a set of process, technologies, architecture that helps in converting raw data into meaningful data which is fruitful in driving overall profitability of business. There are various kinds of softwareās that can be used by organizations that can be used for BI and transform data into actionable knowledge and intelligence. It is one of the most important for retail sector organizations as it helps them to analyse large volume of data and take appropriate and effective decisions so that they can enhance their relationship with their customers and increase their overall profitability. This assignment will lay emphasis on Analysis of superstore and audileadership data so that the given data can be analysed and overall sales and profit of the organization can be identified other than this different kinds of data mining methods will be explained with advantages and disadvantage of Weka software will be discussed in this assignment. PART 1 By using data set of superstores analyse profit and sales over years and analyse it by using Excel for pre- processing of data, also analyse and visualize the data There are various kinds of techniques, methods and formulas in Excel that can be used for analysing and calculating profit and sales over years. It helps in evaluation of data so that organizations can get a brief idea of their average profit and overall sales in last 4 to 5 years so that this data can be used for further important decision making(Moro and et. al., 2020). Excel is one of the most common software that can be used by organizations for calculations and analysis of financial calculations, forecasting data and for various other purposes. Excel has various kinds of inbuilt formulas that can be used for analysing the data and reaching to a conclusion. The main and primary function of Excel is to organize all the information or data of the organization in an appropriate manner. It is important to organize the data if formulas, techniques or methods are required to be used for further analysis. Excel also provides an option of generating graphs or 1
charts so that it becomes much easier for organizations to visualize their data and take appropriate decisions accordingly. In order to analyse sales and profit of the organization over years pivot table and graph method can be used. Row LabelsSum of Sales Average of Profit Furniture5178590.54268.11660673 20091472671.724137.9565402 20101252518.41621.35772727 20111268656.078120.6278708 20121184744.324-10.02715311 Office Supplies3752762.1112.3690738 20091035399.64151.9643028 2010910359.95100.9771282 2011796383.7978.20144784 20121010618.72116.7143313 Technology5984248.182429.2075157 20091701825.482359.7878373 20101397208.679447.037081 20111364905.113519.0770085 20121520308.909402.5972762 Grand Total14915600.82181.1844243 Figure1Average profit and sum of sales of last 4 years. Data interpretation: Above graph has been made with the help of pivot table feature of Excel with the help of which Average profit in last 4 years and overall sales in past 4 years have been 2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
calculated. From the above graph it is clear that technology provides maximum profit to the company and their sales were highest in Technology in 2009. Figure2Average profit in last 4 years Data Interpretation: From the above graphs overall average profit of the company in past four years have been explained. This graph clearly explains highest average profit of the company was seen in technology department in 2011. Technology: average sales and average profit from year 2009 to year 2012 From the above data interpretation, it is clear that sales and profit of technology department in last four years is maximum. Row Labels Average of Sales Average of Profit Technology2897.941008429.2075157 20093145.703293359.7878373 20102631.278114447.037081 20112916.463917519.0770085 20122895.826493402.5972762 Grand Total2897.941008429.2075157 3
Figure3Average profit and sales of Technology Data Interpretation: From the above graph it is clear that average sales of technology were highest in year 2009 Whereas average profit was highest in year 2011. Demonstration of ways in which data can be practically analysed using Excel functions such as Lookup, Pivot table, graphs and charts Lookup Function: There are various kinds of functions and formulas available in Excel that can be used analysis of data. Lookup is one of those functions, this function is mostly used to select values from different range(Fylstra, 2017). It is also used to select a particular type of data from a large amount of data set. This formula is applied to the data which is assembled or stored in either ascending or descending order. This function helps in returning a value from a defined range of data or from an array. It helps in searching a value in other columns, if the value matches the value in another column then it directly returns the value(Wang, Luo and Liu, 2016). So, it can be said that it is used for searching for a particular value if defined rows or columns. Default behaviour of lookup helps in solving any kinds of problems within Excel. Syntax of lookup function is: LOOKUP (value, loopup_range, [result_range]) If we want to identify how much sales order id = 483 generated then for this lookup function can be used =LOOKUP(483,B2:B8400,F2:F8400) 4
=4965.76 Pivot Table: It is one of the most powerful tools of MS Excel as it can be used for analysing, calculating and summarizing data and also allows users to compare it with other data. It helps the users to extract data in a proper manner. It is mostly used for calculating average, sum, extracting data of particular region and for many other purposes. Calculation of shipping cost, product base margin and overall sales of furniture, office supplies and technology Row Labels Sum of Product Base Margin Average of Sales Sum of Shipping Cost Furniture1006.773003.8228253243.69 Bookcases122.094352.6562968646.07 Chairs & Chairmats228.464564.34339415512.69 Office Furnishings414.31885.90585038402.72 Tables241.915252.10011620682.21 Office Supplies2116.77814.048177936095.51 Appliances240.641698.1371896854.11 Binders and Binder Accessories342.421117.9864376633.52 Envelopes91.97707.66585371682.77 Labels108.61135.3526042288.66 Paper458.85364.45131437914.41 Pens & Art Supplies337.76263.99244872041.81 Rubber Bands95.1483.83592179225.21 Scissors, Rulers and Trimmers92.23562.474375670.51 Storage & Organization349.151960.0413929784.51 Technology1148.772897.94100818491.84 Computer Peripherals449.671049.9682594067.34 Copiers and Fax36.3712992.658622446.88 Office Machines149.756435.3030867135.91 Telephones and Communication512.982139.653234841.71 Grand Total4272.311775.878179107831.04 5
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Figure4product base margin and shipping cost Calculation of profit region wise Row Labels Sum of Profit Atlantic238960.66 North Carolina2841.11 Northwest Territories8307.05 Ontario439214.57 Prarie321160.12 Quebec140426.65 West297008.61 Yukon73849.21 Grand Total1521767.98 6
Figure5profit region wise Total order and average profit region wise Row Labels Count of Order ID Average of Profit Atlantic1080221.2598704 New Brunswick323357.1267492 Newfoundland8283.96512195 Nova Scotia464183.9695474 Prince Edward Island211148.6336967 North Carolina7935.96341772 Elon7935.96341772 Northwest Territories59140.7974576 Northwest Territories59140.7974576 Ontario2161203.2459833 Georgina129209.2537984 Hanover624193.4693109 Ontario739168.9889851 Orangeville334222.3565569 Waterloo335275.659791 Prarie1706188.2532943 Manitoba793172.0392938 Saskachewan913202.3362103 Quebec781179.8036492 Quebec781179.8036492 West1991149.1755952 Alberta865175.6606705 7
British Columbia1126128.8296004 Yukon542136.253155 Dawson291201.9967354 Whitehorse25160.03250996 Grand Total8399181.1844243 Figure6Total order and average profit region wise PART 2 By using audidealership.csv file show conjunction with Weka with the example of clustering Figure7audidealership data relationship 8
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Explanation of commonly used data mining methods that can be used in business. Explain them with real time example Data mining is one of the most effective and useful way through which organizations can analyse their raw data and extract useful and important information that can be used by them to take effective decisions(Ibrahim and Shiba, 2019). There are various kinds of data mining methods or techniques that can be used by organizations for evaluation or analysis of data and can be used for different purposes that are fruitful for the organization in order to enhance their 11
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
profitability. These methods can be applied to any amount of data as per the need and requirement of the organization. Different sectors require different data mining methods for analysis of different kinds of information but mostly it is used for analysis of profit, sales, customer data and extracting any other kind of information for decision making. There are many kinds of data mining methods but association, Classification, clustering analysis, prediction, pattern tracking or sequential pattern, decision trees, neural network and Outlier Analysis or AnomalyAnalysisaremostcommonlyuseddataminingmethodsthatcanbeusedby organizations. All the specified methods will be explained with the help of real time examples: Association: It is one of those data mining method which is mostly used by organizations in order to find correlation between different items. it is one of the most important data mining method that can be used by business organizations in order to find hidden pattern within the chosen items so that relationship between the chosen items can be found or identified so that respectivedecisioncanbetakenaccordingly(RussellandMarkov,2017).Retailsector organizations are one of those sectors that majorly uses this method for different purposes such as predict their customers buying behaviour, pattern or for identification of products or services that are in demand within customers. For example: a super market wants to analyse their customers data so that they can identify which item is sold mostly and what combination of products are mostly purchased by customers. With the help of this method supermarket can analyse that 65% of customerpurchase eggs then they also purchase milk aswell and approximately 13% of customers purchase both milk and eggs. Classification: It is another commonly used data mining method which is used by organizations so that they can distinguish between different items. this method is used so that organizations can differentiate between items in order to be categorized on their basis of their features, behaviour etc. and can be categorized within groups(Sehgal and Bhargava, 2018). With the help of this methods organizations can segregate their data category wise and take decisions accordingly by applying some rules. This method can be used by banks, supermarkets and many other kinds of industrial sectors. For example, is a super market wants to categorize their products on the basis of their features then they can use this feature. Bank can also use this feature in order to categorize their loans on the basis of risk associated with it (high, medium and low risk). Clustering Analysis: Many times, people get confused between classification and cluster analysis but there is huge different between both of them. All the clusters are developed for 12
different objectives and are made on the basis of their dependency on other similar items. As per the similarities and dissimilarities data sets are partitioned into small segments because of which sometimes it is also known as data segmentation(Hussain and et. al., 2018). Clustering analysis is further divided into sub methods that can be used by organizations as per their needs and requirements. It is mostly used by organizations for decision making or for development of strategiesinordertoincreasetheirsalesandprofitability.Forexample:Aautomotive organization can be use clustering analysis in order to segregate their customers who have taken loan for purchase of vehicle and customers who have completely paid for the purchase of the vehicle or also to analyse whether customers who have purchased vehicles are capable for completely paying whole price of the vehicle or not by checking their salaries and amount of loan they have taken. Prediction: It is another commonly used data mining method which is used by organizations for analysing past and present data sets so that prediction for future can be made(Homocianu and Airinei, 2017). For this more than two data mining methods are clubbed together so that analysis of data can be done. There are four methods that can be used together for predictions such as: trend analysis, pattern matching, relation analysis and classification. It is majorly used by retail sector organizations so that they can analyse their past and current data an order to predict future sales. It can also be used by retail sector organizations for predictability of their overall revenue and profit. Sequential Pattern or Pattern Tracking: this method of data mining is used by organizations for identification of data patterns within a fixed interval of time so that needful and required data can be obtained for taking appropriate and effective decisions(Jain, Sharma and Sharma, 2017). For example: a supermarket can use this method in order to identify when sales of which is product is highest and at what time of, they year. So that n the basis of this data they can take required decisions such as increase availability of that product within that time of year in order to increase their sales and revenue. Decision Trees: it is another kind of data mining method which helps organizations to classify their data in different categories so that effective and appropriate decision can be taken. This method can be used by any industry who requires to categorise their data for decision making. For example: if government organizations want to check what number of citizens helps are 13
eligible for giving their vote, how many of them are male, female. This method can be used by government so that they can categorize their citizen on the basis of abo e criteria. Neural Network: if organizations what to check relationship between their input and output then this method is used(Homocianu and Airinei, 2017). It is completely based upon biological neural network and is used for data processing, regression etc. It can be used by organization so that they can recognize patterns within their input and output so that they can take decisions accordingly and bring changes within their input in order to enhance their output. It is used by hospitals for analysis of their data such as amount of drug provided to patient and their recovery rate. Anomaly Analysis or Outlier Analysis: this method is used by business only when they need to identify data items that do not begave in an expected pattern or do not comply with expected behaviour(Jain, Sharma and Sharma, 2017). It is mostly used by organizations if they want to identify faults, frauds or issues within their current system or pattern such as: it can be used by An IT organization to identify unauthorized access to their data or is any of their data been hacked or not. Advantages and disadvantages of Weka Advantages of Weka Weka is a tool which is used for data classification, pre-processing, clustering, regression, visualization and association rules. It has various kinds of advantages as compared to Excel. It is a well-suited software that can be used for developing new machine learning schemes. There are many other advantages that Weka has over excel such as: it is a freely available software under GNU general Public license (Ibrahim and Shiba, 2019). It is portable as it is completely implemented in java programming language and can be used or run on any kind of modern computing platform. It is a comprehensive collection of data processing and modern techniques. But most importantly it is quite easy to use because of its graphical user interface. There are many other kinds of data formats that are supported by Weka as compared to Excel, such as: ARFF, C4.5, CSV and binary.It can easily be integrated with other java packages whereas Excel faces various kinds of difficulties when it comes to its integration with java packages. Disadvantages of Weka Weka has many disadvantages as well, such as: one of the main disadvantages is that it can only handle small data sets. If the data set is larger than few MB, OutOfMemory error occurs 14
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
and if it is important to work on the data set then there will be a requirement of reducing overall size of the dataset. So that further work on the data set can be done and completed within a particular time period. It also lacks proper documentation (Russell and Markov, 2017). Not only this, in this system is updated constantly whereas in Excel regular updating are not required. Its connectivity with Excel and other non-java packages is not good. It also does not have a facility through which parameters for scaling can be saved for future datasets. One of the main disadvantages of Weka is that it does not have all data visualization and preparation techniques because of which not all kinds of operations can be performed on it. CONCLUSION From the above assignment it has been summarized that data mining is one of the most useful tools that can be used by organizations that can be used for finding patterns, relationship between input and output, and for taking appropriate decisions from a large data set. It has been analysed that with the help of data mining organizations can also analyse their current and past data so that on the basis of this, they can predict their future sales and take decision in an appropriate manner. It can also be used by organizations in order to enhance their relationship with their customers. there are various kinds of tools that can be used for fata mining. Excel is one of those tools that can be used by organizations for data mining, for analysis of current data, extraction of important and useful data for decision making. Weka is another data mining software that can be used for data handing and analysing sales data so for prediction of future results. 15
REREFENCES Books and Journals Ibrahim, F.A. and Shiba, O.A., 2019. Data Mining: WEKA Software (an Overview).Journal of Pure and Applied Sciences.18(3). Hussain, S., and et. al., 2018. Educational data mining and analysis of studentsā academic performance using WEKA.Indonesian Journal of Electrical Engineering and Computer Science.9(2). pp.447-459. Jain, A., Sharma, V. and Sharma, V., 2017. Big data mining using supervised machine learning approaches for Hadoop with Weka distribution.International Journal of Computational Intelligence Research.13(8). pp.2095-111. Homocianu, D. and Airinei, D., 2017. The Excel Data Mining Add-in. Applications in audit and financial reports.The Audit Financiar journal.15(147). pp.451-451. Moro, S., and et. al., 2020. Unfolding the Drivers of Student Success in Answering Multiple- Choice Questions About Microsoft Excel.Computers in the Schools, pp.1-19. Sehgal, M. and Bhargava, D., 2018. Knowledge mining: An approach using comparison of data cleansing tools.Journal of Information and Optimization Sciences.39(1). pp.337-343. Homocianu, D. and Airinei, D., 2017. The Excel Data Mining Add-In. Applications in Audit and Financial Reports (Componenta Excel Data Mining. AplicaČii in audit Či raportÄri financiare).Audit Financiar.15(3). p.147. Wang, P., Luo, H. and Liu, J., 2016, May. Format-preserving encryption for Excel. In2016 IEEE InternationalConferenceonConsumerElectronics-Taiwan(ICCE-TW)(pp.1-2). IEEE. Russell, I. and Markov, Z., 2017, March. An introduction to the Weka data mining system. InProceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education(pp. 742-742). Fylstra, D.H., 2017, December. Simulation models in Excel, Tableau, power BI and mobile apps with analytic solverĀ® software. In2017 Winter Simulation Conference (WSC)(pp. 4422-4422). IEEE. 16