This document discusses the current trends in data warehousing, business intelligence, and data mining. It evaluates the use of Excel for data pre-processing, analysis, and visualization. It also explains the workings of Weka and the most common data mining methods used by organizations.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data Handling And Business Intelligence
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Contents Contents...........................................................................................................................................2 INTRODUCTION...........................................................................................................................1 MAIN BODY..................................................................................................................................1 PART 1............................................................................................................................................1 Current trends in data warehousing, business intelligence and data mining...............................1 Evaluating the use of Excel for pre-processing the data, analysing the data and visualising the data...............................................................................................................................................2 PART 2............................................................................................................................................8 2.1 Workings of Weka.................................................................................................................8 2.2 Explain the most common methods of data mining which can be used by organization....11 2.3 Discusses the advantage or disadvantage of Weka tool......................................................12 CONCLUSION..............................................................................................................................13 REFERENCES..............................................................................................................................15
INTRODUCTION Data handling involves thegathering, processing data, evaluating data and eventually displaying data with graphs or diagrams. The creation of information comes to us quite naturally. Theysendanumbertoanyonewhoasksthemanduseaccordingly(AhsanandBais, 2018).Business Intelligence is a collection of methods, systems, and innovations that turn raw data into usable information that motivate efficient business activities. It's a scripting language that transforms data in knowledge and intelligence that is admissible. They cover the broadest possiblefieldssuchasmarketing,accounting,economics,technology,computerscience, anthropology, art history, medical science and biology where they can use data handling or business intelligence techniques.This report based of data analysis of given information and some common data mining tools which are used by organizations. In addition, discuss the advantage or disadvantage of WEKA tool with support of database argument. MAIN BODY PART 1 Current trends in data warehousing, business intelligence and data mining The concepts of data warehousing, business intelligence and data mining are related to each other which helps modern business organisations to collect, store and classify data so that it can be analysed in such a way that beneficial insights for business can be gained and suitable decisions can be made.Data warehousingis a process which involves storing the large data in a secured manner so that it can be used in future(Mohammed, Naugler and Far, 2015). For various organisations, their data is their asset due to which it becomes even more important to protect their data from theft. There are various current trend which are enhances the practice of data warehousing. These trends are increased enablement of self service data access using cloud services. This current trend provides the suitability to the data owner to access the data whenever it is required from any digital electronic device. Other current trends for data warehousing are growth of NoSQL Adoption and Big Data Analytics in the Cloud. Business intelligenceis the combinations of all the technologies which are used by business organisations to collect, integrate, analyse and present their business information. Among the various current trends in BI, the most influential is self service BI and analytics. Accordingtothistechnology,organisationswhichareofsmallscalecanusebusiness 1
intelligence tools and techniques by themselves and do not require external assistance. This trend is focused to reduce the BI expenses of small scale organisation which does not own large data. Other current trends in this field are big data innovation through social media which develop customer analytics and text analytics which enables business to interpret Social Media Sentiment (Mitrovic, 2020). Data miningis the procedure in which an individual can mine the pre existing data by using BI tools to generate hidden and analytic information from that data. Current trends in this field are visual data mining, enhancement of a standardised language for data mining and integration of data mining with data warehousing(Imhoff and White, 2011). Evaluating the use of Excel for pre-processing the data, analysing the data and visualising the data MicrosoftExcelisasoftwareapplicationwhichhelpsinrecordingandanalysing numericalinformationsothatnumericaldatacanprovideinformationinamuchmore understandable way. This software application is used for Superstore which is facing decline in their sales and profits over the years. Data pre processing The data set of superstore is first pre processed using Excel. Data pre processing is a technique which helps in cleaning the data so that it can be analysed without any error. The first step which is taken to pre process the data is to find the missing values. By the shortcut key of Shift + F4, all the missing values are first identified and then they are filled as the average of their column. Secondly, in order to transform the data, all the numeric amount values are formatted to 2 decimal points in order to bring familiarity. Then in last step of data pre processing, the data set of superstore is reduced to a certain level using Pivot tables. In the pivot table only those variables are selected which can impact the profit or sales of superstore. Data analysis and visualisation Once the data is pre processed, it is important to analyse it in such a way that it can fulfil the aim of identifying the reasons due to which sales and profit of this company are decreasing (Macaulay, Sekharan and Wang, 2017). For analysing, first all the numerical variables are summed together according to their year using the Excel function of “=SUM()”. By using this function sales, profit, discount, shipping cost, unit price and order quantity are summed on the basis of 4 year 2009, 2010, 2011 and 2012. And then respective graphs are also developed. 2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Sum of Profit Sum of Sales Sum of Discount Sum of Shipping Cost Sum of Unit Price Sum of Order Quantity 2009434096.024209896.85105.3928481.7623283154508 2010364917.333560087.04105.8127354.26162467.654379 2011380310.53429944.98101.6724939.85159653.151413 2012342444.133715671.95104.3227055.17195467.654480 3
4
From all the graphs above, it has been seen that sales of superstore has declined the most in year 2011 and profit was at it worst in 2012. The reduction of sales revenue in 2011 occurred due to due to low disocunt allowed in 2011,low shipping cost paid and less unit price charged by ths superstore in 2011. All the four numerical variables analysed above has direct impact upon the sales of superstore. In order maintain the sales revenue, Superstore must ensure that they control their product unit price and avail sutaible disocunt so that their sales can be increased(Bordeleau, Mosconi and Santa-Eulalia, 2018). None of the above numeric variables has the direct impact upon profit due to which in order to identify the variable which impacts profit, all non numerical variables are also analysed below. Using the Excel function of countif, total number of categorical variables are identified. For example, in order to identify how many times “Deelivery truck” shipment mode is used in 2009,theexcelformulawhichusedis“=COUNTIF('SuperstoreSales1588154780137'! H2:H2159,'Analysis and presentation'!C9)”. Shipment mode Delivery TruckRegular AirExpress Air 20093071582269 20102981597246 20112631460275 20122911609202 5
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
From the above table and graph, it has been seen that in 2012, the shipment mode of “Express Air” is minimum which is a valid resaon for decline in profit in 2012. Region North CarolinaOntario Northwest TerritoriesAtlanticWestPrarieQuebecYukon 20091655116283507464188133 20101857516295483412200142 20112651719240460412191133 2012195188262541418202134 6
From the analysis of “Region” variable, it has been seen that decline of sales in Northwest Territories in 2012 can also be the reason for decline of profit in 2012. Customer segment Small BusinessConsumerCorporateHome Office 2009416443790509 2010434433764510 2011389396718495 2012403377804518 7
The variable of “Customer segment” is analysed above using a line graph and table, from which it has been analysed that in year 2012, the sales percetnage to consumers reduced at its minimum which is also a rsaon of reduced profit. Product category Office SuppliesTechnologyFurniture 20091169541448 20101170531440 20111112468418 20121159525418 From the above tables and graphs, it has been analysed that the decline of profit of superstore in year 2012 has occurred due to decline in usage of shipment mode “Express Air”, reduction in sales to “Northwest Territories”, decline in sales to “consumer”Customer segment and reduction in “furniture” sales in 2012. PART 2 2.1 Workings of Weka Weka is analytic software which helps in conducting statistical functions such as clustering anddescriptiveanalysis(Vera-Baquero,Colomo-PalaciosandMolloy,2013).ForAudi dealership, their data is analysed using Weka in which k means clustering is used with default 2 8
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
clusters. Clustering is the method of grouping the entire data in classes according to their common features(Fuchs, Höpken and Lexhagen, 2014). === Run information === Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -S 10 Relation:audidealership2 Instances:100 Attributes:8 Dealership Showroom InternetSearch RS7 A4 TT Financing Purchase Test mode:evaluate on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 160.2980769230769 Missing values globally replaced with mean/mode Cluster centroids: Cluster# 9
AttributeFull Data01 (100)(48)(52) ================================================= Dealership0.540.33330.7308 Showroom0.640.66670.6154 InternetSearch0.390.43750.3462 RS70.530.29170.75 A40.550.81250.3077 TT0.50.58330.4231 Financing0.60.33330.8462 Purchase0.380.04170.6923 Time taken to build model (full training data) : 0.02 seconds === Model and evaluation on training set === Clustered Instances 048 ( 48%) 152 ( 52%) 10
The data set of audidelaership, involves record of 100 people in which “0” represents that a person has not made it to a certain step and “1” represents that the person is successfully made it to that step. The first cluster which is “0” has 48% of instances and cluster “1” has 52% instances. From the above cluster results and graphs, it is clear that out of 100 people only 54 walked through the dealership and 64% among those people walked in showroom. The people who made it till the showroom, only 38% finally make a purchase. 2.2 Explain the most common methods of data mining which can be used by organization Data mining is known as a method for extracting the useful information from a broader range of raw data(Bordeleau, Mosconi and Santa-Eulalia, 2018). It includes analyzing patterns of data utilizing one or more tools in large quantities of information. Data mining includes uses such as science and analysis in many fields. As a data mining tool, organizations can understand more from their clients and establish efficient strategies for specific business functionsand in effect use resources more effectively and insightfully. This helps organizations to get closer to their targets and take better decisions. Data mining is required to enhance current systems utilizing historical Systems data. Technical facts are collected in broad connection databases to find similarities or trends across thousands of areas (Data Mining Methods,2020). There are some common data mining methods which are as follow: 11
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Tracking patterns: The knowledge about trends in the databases is among the most important techniques in Data Mining. This is generally an acknowledgement of some aberrations in thedata that occur frequentlyor a flux of a specific factor over time. For instance, they may find sales from a certain product tend to increase just before holidays or that warm weather brings more people to thehomepage. Real business example of this method is Artificial intelligence that is required to make relevant pattern which help the organization to identify most demand demanded product on their store and the buying frequency as well. Classification: It is a much more advanced data mining tool that requires businessto gather different characteristics in distinguishable classes and then draw specific hypotheses or actions(Llave, 2018). It may be able to identify the data as "low," "medium," or "high" credit risks, for example, if they analyze individual customs financial record and acquisition history. Such classifications could then be used to learn more about these consumers.Data mining method of classificationhelp to classify factorsinfluencingthe client'sbankingchoices. Identifying of similar behavioural clients can make targeted marketing simpler. The financial institution analyzes the client payment history withclassificationsmethod and chooses important factors such as income ratio transaction, credit history, the duration of the loan, etc. The results would help the banks choose their policy on loans and also offer customers loans according to the evaluation of their factor. Association: This data mining tool is linked to patterns of monitoringbut more basic to variables relying on them. In such a scenario they can check for certain events or characteristics that are strongly correlated. For example: Owners of the supermarket and grocery business to learn customers' preferences(Mitrovic, 2020). Look at the customer's purchasing history where with the help ofdata mining tools, they able to getcustomers' purchasing preferences. Using these tests, supermarkets plan goods on shelf and sell items such as discounts and exclusive offers on other goods. This is focused on the classification of RFM that is stand for recentness, frequency and money grouping. To these categories, promotions and marketing campaigns are adjusted. 2.3 Discusses the advantage or disadvantage of Weka tool Weka stands for Waikato Environment for Knowledge Analysis; it is a series of data mining algorithms. These algorithms could be introduced to a dataset directly or can be named in your own Java code. In the Weka work bench are numerous simulation tools and algorithms, along 12
with graphical user interfaces, for data analysis and prediction modeling for ease of access towards this function. It has several features such as; it is java based data mining tool which is a combinationofmachinelearningalgorithmsanddataminionthatincludesclustering, classification and association rule extension (WEKA Tool,2020). Weka offers three user interfaces such asexperimental framework for evaluating and analyzing machine learning algorithmsand the Knowledge Flow for the application model-inspired interface for KDD visual design,tosupportpre-processing.Wekaoffersanexploratorydataanalysisexploration, collection of attributs, learning, visualization. Weka also offers a basic unit-line explorer, an convenient interface from which to type commands. This tool has some advantages and disadvantages which are as follow: Advantages: This Weka tool is beneficial for mining association rules. Very effective as well as most suitable for machine learning. It is also very essential for new machinery learning schemes. Weka installs ARFF, CSV, C4.5 and binary data folders. Although it is open source, can be incorporated into other java packages because it is easy to available and extensible. Disadvantages: There is inadequate and proper documentation which suffers from"Kitchen Sink Syndrome" thatis a issue when systems are continuously modified. Worse forExcel and non-Java-based data bases accessibility. Not as powerful as Fast Miner's CSV reader. Notsmooth functioning. In traditional statistics, Weka is so much poorer. They will never save criteria to be added to future databases for scaling. There is no automated solution for machine learning / statistical methods parameter optimisation. CONCLUSION From the above report, it has been analysed that data mining and warehousing are important concepts for business organisations to gather such information which can help them in gaining valuable information for their company which finally lead to effective decision making. It has 13
been also concluded that the software application of Weka is appropriate for small organisations and Microsoft Excel is appropriate for large scale organisation with big data. 14
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
REFERENCES Books & Journals Ahsan, U. and Bais, A., 2018. Distributed smart home architecture for data handling in smart grid.Canadian Journal of Electrical and Computer Engineering.41(1). pp.17-27. Bordeleau, F. E., Mosconi, E. and Santa-Eulalia, L. A., 2018, January. Business Intelligence in Industry 4.0: State of the art and research opportunities. InProceedings of the 51st Hawaii International Conference on System Sciences. Llave, M. R., 2018. Data lakes in business intelligence: reporting from the trenches.Procedia computer science.138. pp.516-524. Mitrovic, S., 2020. Adapting of international practices of using business-intelligence to the economic analysis in Russia. InDigital Transformation of the Economy: Challenges, Trends and New Opportunities(pp. 129-139). Springer, Cham. Mohammed, A.A., Naugler, C. and Far, B.H., 2015. Emerging business intelligence framework for a clinical laboratory through big data analytics.Emerging trends in computational biology, bioinformatics, and systems biology: algorithms and software tools. New York: Elsevier/Morgan Kaufmann, pp.577-602. Mitrovic, S., 2020. Adapting of international practices of using business-intelligence to the economic analysis in Russia. InDigital Transformation of the Economy: Challenges, Trends and New Opportunities(pp. 129-139). Springer, Cham. Imhoff, C. and White,C., 2011. Self-service business intelligence.Empowering Usersto Generate Insights, TDWI Best practices report, TWDI, Renton, WA. Macaulay, A., Sekharan, S. and Wang, Y., Business Objects Software Ltd, 2017.Presenting visualizations of business intelligence data. U.S. Patent 9,536,096. Bordeleau, F.E., Mosconi, E. and Santa-Eulalia, L.A., 2018, January. Business Intelligence in Industry 4.0: State of the art and research opportunities. InProceedings of the 51st Hawaii International Conference on System Sciences. Fuchs, M., Höpken, W. and Lexhagen, M., 2014. Big data analytics for knowledge generation in tourismdestinations–AcasefromSweden.JournalofDestinationMarketing& Management.3(4). pp.198-209. Vera-Baquero, A., Colomo-Palacios, R. and Molloy, O., 2013. Business process analytics using a big data approach.IT Professional.15(6). pp.29-35. Online DataMiningMethods.2020.[Online].AvailableThrough: <https://www.datasciencecentral.com/profiles/blogs/the-7-most-important-data-mining- techniques> WEKATool.2020.[Online].AvailableThrough: <http://www.e2matrix.com/blog/2017/10/14/data-mining-tools/> 15