This report explores the use of Excel for data pre-processing, including IF functions, VLookup, and charts and graphs. It also discusses common data mining methods used in business with real-world examples. The report provides a comprehensive understanding of data handling and business intelligence techniques.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data handling and Business Intelligence
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Contents PART 1............................................................................................................................................3 1. Use of excel of pre-processing the data:.............................................................................3 1.1.1 Use of IF function in Excel:..........................................................................................5 1.1.2 VLookup:.......................................................................................................................5 1.1.3 Charts and Graphs:........................................................................................................6 PART 2............................................................................................................................................8 2.1 Using the audidealership.csv provided in conjunction with Weka give a specific example of clustering............................................................................................................................8 2.2 Most common data mining methods which are used in business with real world examples. ................................................................................................................................................9 2.3 Advantages and disadvantages of Weka over excel.......................................................12 REEFRENCES..............................................................................................................................14
INTRODUCTION In present time, businesses have large amount of data related to different operating activities which is needed to be controlled and effectively managed to make useful results. Data handling is a modern technique which support in managing and assorting of large data into presentable group which make easy analysing and summarizing of key findings. In this report, large data related to superstore of car is used to identify and critically evaluate the current trends in data warehousing, business intelligence and data mining. In addition report also demonstrates a comprehensive knowledge and systematic understanding of essential concepts and principles by using predictive analytic software. PART 1 1. Use of excel of pre-processing the data: Prior to quantitative analysis, thedata pre-processing comprises of a variety of measures for converting raw database from data acquisition into a ‘safe’ and ‘tidy’ dataset. Data pre- processing is a tool for thedata mining which transforms raw data into a comprehensible format (Pashazadeh and Navimipour, 2018). The data are also presented in line graphs, diagrams and histograms. In Excel, we will play with other kinds of information from different data facets. From production to small shops and government agencies MS Excel is hardly used. Individuals in every office realize what the key significance of the expression for this most common program is: 1) Pivot Tables PivotTables collects vast volumes of Excel data using a structured table whereheading forfirst row and Definitions or Attributes for the second row are used. The description of data is versatile, but typically Pivot table values for some or all categories are included. Excel 2013 will evaluate the data and propose Pivot Table forifis new to building Pivot Table. Tap on each ofcells either table inrange: How thisfunction is working: InPivot Table Suggested dialogue box, selectone that operates bestand click OKevery Pivot Table design. Excel can provide the data with a set of suggested pivot tables. Excel after which positions, Pivot Table onnew worksheet as well as displaysfield list to reorganize the information to fitrequirements.
2) Conditional Formatting Asname suggests, conditional layout adjusts the cell structure based oncell material, or some cell or cellsinworkbook or some other. Conditional layout lets users easily concentrate on main elements of a table or show mistakes and recognise relevant data trends. The specified conditional formats areused forcell depending on the location oneset or onlocation generated by Excel by contrastingcell variablesin range. For e.g., with any compensation above a certain sum, any employee before a certain date or any individual with a certain name, the status quo may be extended in a list of employees' salaries. The manager enters. The salary column will have graphical conditional templates and the largest andlowest amounts in the table will be evaluated by choice, but can also be overridden where possible. 3) Sorting and Filtering Excel tablets help us to understand large quantities of data. You may order the details or simply pick the data that you need, depending on the parameters you defined in MSExcel, to render it easy to locate what you want. Order and sort the data savestime and increases the productivity of your chart. Guess onehad a list of 100 records, thedates, ages, differentnames, towns, and much more. With Excel's sort and filter functionality, you can easily organize data as peruserneeds (Patan and Babu, 2018). 4) Basic Math The biggest challenges arenumbers of every Excel row. One aspect that makes Excel so efficient is the use of simple math functions which control such numbers. InExcel'sformula bar, simple measurement as it will be written down could be entered. Start the computation using the = symbol, like all Excel equations. When oneclick Enter, the result emerges inside the cell or shape bar. The equation should be placed straight ontocell or formula bar. A further option, as can be found here (in which cell A1 (or 87) + cell A2 (or 16) is equitable to cell A3 (or 103)), is to make multiple type cells. 5) Mixed Type Charts The hybrid or mixed form charts blend two types of charts, including the Excel chart for the column and the line chart for example. This layout can assist exhibit two distinct information kinds or a wide variety of values. For instance, the no. of homes being sold during Jun to Dec. andaverage monthly selling prices can be identified more easily using a line graph (Young- Saver, and et.al., 2018).
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
1.1.1 Use of IF functions in Excel: Oneshould add the data using the IF feature in order that, when onehas a valid state, one outcome is retrieved and a second outcome is retrieved if the state is Incorrect. The IF feature should be separated such that you can only use one criterion or nest while several conditions are used to achieveTRUE or FALSE results. In this structure, one shouldlook atdate when contracts and profitability were reduced anddate when the supermarkets twocontracts were boosted (Zhou, and et.al., 2017). •Rearranging the application date, structures and benefits on a separate sheet as a subject of first importance. •Actually, change the data which are first used by orchestrating the required exceedance cap. • Choosecell where"IF Function" is required •Enter the code in the cell: = if •Comma type: B2>B3, •Enterwhatmustshowwhenconditionhasbeenencountered.Whencontentsare displayed, form inside declarations: "Reduction" • Entercomma: Typing the code. • If userare slightly unable to fulfilrequirement in userdeclaration, type: "Augmentation" • Closearea and click Enter key at this stage. • Above IF function would look such as this: = ifpace ormotivating factor in thecell B2 is greater than B3, then display the decreasing and then display Increased if it's under theB3. • Onecan getoutcome of thecell B4 after pressingEnter key. Pushhandle ofcell back to cell D8400 whenoneneed to seeresult. 1.1.2 VLookup: Usermainly looks fortable value with this instruction. I'm going to teach you thoroughly exactly how it operates, so let's start now. Two separate sheets A and B. Assume there are. When one column infirst sheet ofother sheet should be valid for one record in both thesheets and only then willdata in the next sheet be found. For both thesheets, one data must have 1data in prevalent which is available in both thesheets. Even though the value is taken from Vlookup) (and moved to the next sheet before the relation or specific details. Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Thesearchmeaningisthemeaningbywhichdatawillbescanned.Document, numbersand datesmay be used. Demonstration of Look up work on given circumstance of Superstore Sale: Usingsheet of desires equally surpassed, the advances would be drawn: Lookup Value: Choose cells G2, H2 and I2 for time, structure and benefit characteristics. The observations on G3, H3 and I3 will be seen. Usecell H3 and bring Lookup to work; use Cell G3 as theLookup Value. Table series: Choose from theA2 to C8400 (A2:C8400) for the complete range. [Range_ lookup]: Chosen cell has come to B2 to B8400 (B2:B8400). B2:B8400. 1.1.3 Charts and Graphs: The Pie Graph is a representation in theExcel. The evaluation of some data is a lot of support. By transforming this into a pie graphic, usercan visualize complicated data quite quickly. Assume you have Sensex tracked. If a certain share conducts wellamongtop five shares, then perhaps usercaneasily comprehend how much fluctuations have been noted during anyparticular period of one year by comprehendingperformance ofwhole year of that share (Birks, Lotter, Juggins and Smol, 2012). Mixed or Combination Charts The combined or mixed characteristics are onebest in graphs. Usermay only connect 2 or even more graphs to display the same diagram with only one chart. Steps are: ï‚·Pickcell to be used to create a line chart. ï‚·Go to the insert option and then select line chart.
01/01/2009 10/03/2009 17/05/2009 24/07/2009 30/09/2009 07/12/2009 13/02/2010 22/04/2010 29/06/2010 05/09/2010 12/11/2010 19/01/2011 28/03/2011 04/06/2011 11/08/2011 18/10/2011 25/12/2011 02/03/2012 09/05/2012 16/07/2012 22/09/2012 29/11/2012 -20000 0 20000 40000 60000 80000 100000 Sales Profit (Source: superstores sales.csv) Interpretation: Data from this schedule reveal thatmost expanded arrangementsandmost escalated accidents have been observed by the organization in January 2009. The second-biggest contracts, onother side, was in 2012. The photo demonstrates significant improvements both in policies and incentives. In the years 2009-2010 and 2012, the company encountered considerable difficulties. The most notable change in the structures can be seen in 2009, when most years had not been as red as they have been in 2009
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
PART 2 2.1 Using the audidealership.csv provided in conjunction with Weka give a specific example of clustering
Interpretation: The above finding indicates even lower-than-1 commitment esteem; this means a customer's compensation won't be affected by whether a customer sees an Audi (Olson and Delen, 2008). The figures seen in the chart described indicate that the company sought the maximum deals and most mayhem in January of 2009 (Birks, and et.al., 2012). 2.2 Most common data mining methods which are used in business with real world examples. Dataminingisthepracticeofextractingknowledgeandpatternsfromlargefiles. Informationprocessingincludesdatacollection,storage,measurementandanalysis. Observations / patterns are often refers to as processing of collecteddata. Data analysis is a logical tool for the exploration of useful information (Van Der Aalst, 2016). Assessment and interpretation are therefore important for data mining. Technological advancements have led to large numbers throughout various sectors and institutions. This will collect and use necessary data which can ultimately have been included in the decision-making as well as business operations. Data mining involves three steps: 1.Exploration:The data is approved and converted at this stage. The essence of the findings is also calculated here.
2.Pattern Recognition:This is thenext step is to select the framework which makes the prediction the greatest. 3.Deployment:The existing patterns are used to achieve the required results. Data mining technique: Implementing the right data gathering techniques is among the most critical practices in the industry. The Data gathering platform must be picked by the firm's competitiveness and issue type (Gabrio, Mason and Baio, 2017). A standardized methodology should be used to increasing the precision and cost-efficiency of application. This article deals with seven main data mining strategies: Statistical Methods:At specific time the mathematical world confronts the same major challengeasdataanalytics:howtocreateaccurateconclusionsfrominadequateearth- consciousness. The most important method to data is a brief and clear description for the connection between results and conclusions. The text implements the same protocol and discusses a number of fundamental problems emerging from scientific data: hypothesis control, trust estimationand ambiguous numeral measurement. Data mining is a process collecting statistical results and explaining them. 1.What is pattern in the data network? 2.How often times could this happen? 3.What were the major business patterns? 4.How much a high-level overview of what the paper contains? Clustering techniques:Clustering is among the main data mining strategies. Clustering is really the process through which information is similarly clustered (Aditsania and Saonard, 2017). It offers an overview of inconsistencies between studies and their relations. This is also referred to as optimization and helps users define the database operations. For instance, life insurance companies can recognize their clients by employment, age and healthcare as well as claim form. There are many options available on the cluster: 1.Methods for separation 2.Hierarchical agglomerative approaches 3.Methods basing on density 4.Tools Grid based 5.System based models
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Tree Methodology for Induction Decision: A decision tree is an isolated device, and its opinion means tree. Throughout this approach, each tree branch is often used as an identification problem, as well as the tree shells are known for their divisions in the dataset. This method may be used to simulate exploratory simulations and to pre-process. For a particular purpose, the decision tree can be seen as a relatively small number of the initial dataset. The data in the segment indicates certain gaps in predicting their outcomes. Decision trees give easy outcomes for the user to comprehend. Neural networks: Neural networks are yet another essential method principally employed by human beings. This method is popular in early data collection (Sun, Chen, & Zhou, 2017). Neural networks are user-friendly since they are conditionedand developers need not be convinced of the scientific knowledge or software. An organization of machinelearning developed the Neural Network. Nonetheless, in order to operate effectively in machine learning, users need to know: 1.How does it connect nodes? 2.How many processing units will there be? 3.By the time the teaching stops? Strategy of Association Rule:The connection is established among two or even more symbols. Also important is the connection between different variables in the computer system. The proof for the grouping of factors and the persistent occurrence of multiple stimuli of the same amounts is found in the underground pattern archive. 2.3 Advantages and disadvantages of Weka over excel This information processing method is used to discern information items that don't adhere to standard example or planned behaviour. Certain unexpected items about knowledge are viewed as anomalies or commotion. These are effective in other areas, such as detection of charging card misstatement, place of disruption, recognition of shortcomings etc. For example, assume that the diagram below is plotted using in our server a few knowledge indexes. Thus, it draws the best match side. Advantage Weka is a research scientist in the field of linux code extraction. It will not only assist tomeasure the AI but will also enable the strength of understanding and meta-students (Park and Snyder, 2018). The package contains three apps: a unix operating system and an Adventurer GUI
(that enables userto examine and determine and perception a small data set of GUIs and prototypes). Using a package such as Weka is definitely compelling as of refraction collection and calculation is part of the big number of specifics. This makes for very simple access to image submissions and uploading procedures, as only one position is needed. There is also a software kit available to ease their use in order to make maximum results. Disadvantages In certain methods, such as community evaluation, it is much less efficient: in this situation only the principal procedures are provided. When working with massive quantities of data, WEKA may have processing issues, because the data mining tool attempts to load them all up in RAM. So the simple task scheduler (CLI) offers a work around that helps you to properly handle vast volumes of data (Read and et.al., 2016). CONCLUSION In the end of report, it has been founded massive data can easily be categorise and assemble into smaller groups with the help of data handling techniques such as Lookups, Pivot table etc. This enables the user to make the sound results and discover areas which needs proper focus as it can create problems in future. Weka is a series of algorithms for computer vision to explore data. The techniques can be implemented to a dataset either specifically to a large volume of data making quick and required results. Weka includes resources to pre-process, detect, regress, cluster, connect rules and visualize results.
REEFRENCES Books and Journals Pashazadeh, A. and Navimipour, N. J., 2018. Big data handling mechanisms in the healthcare applications: A comprehensive and systematic literature review.Journal of biomedical informatics,82, pp.47-62. Patan, R. and Babu, M. R., 2018. A novel performance aware real-time data handling for big dataplatformsonLambdaarchitecture.InternationalJournalofComputerAided Engineering and Technology,10(4), pp.418-430. Young-Saver, D. F., and et.al., 2018. Handling of missing outcome data in acute stroke trials: advantages of multiple imputation using baseline and postbaseline variables.Journal of Stroke and Cerebrovascular Diseases,27(12), pp.3662-3669. Zhou, C., and et.al., 2017.Spatial Data Handling in Big Data Era. Springer. Birks, H. J. B., and et,al.,Tracking environmental change using lake sediments: data handling and numerical techniques(Vol. 5). Springer Science & Business Media. Van Der Aalst, W., 2016. Data science in action. InProcess mining(pp. 3-23). Springer, Berlin, Heidelberg. Fisher, P. F. ed., 2006.Developments in Spatial Data Handling: 11th International Symposium on Spatial Data Handling. Springer Science & Business Media. Linoff, G. S. and Berry, M. J., 2011.Data mining techniques: for marketing, sales, and customer relationship management. John Wiley & Sons. Niglas, K., 2007. Media Review: Microsoft Office Excel Spreadsheet Software.Journal of Mixed Methods Research,1(3), pp.297-299. Olson, D. L. and Delen, D., 2008.Advanced data mining techniques. Springer Science & Business Media.