This report discusses the use of Excel for data analysis and visualization, focusing on the decline in sales and profits over the years. It also explores the workings of WEKA using audidealership.csv and analyzes various data mining methods used in business. The advantages and disadvantages of WEKA over Excel are also discussed.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data handling and business intelligence
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Contents INTRODUCTION...........................................................................................................................1 PART 1............................................................................................................................................1 Evaluating the use of Excel for pre-processing, analysing and visualising the data using Superstore data and determining the decline in sales/profits over the years...............................1 PART 2............................................................................................................................................8 2.1 Workings of WEKA using audidealership.csv......................................................................8 2.2 Explaining data mining methods that can be used in business with real world examples. .11 2.3 Discussing the advantages and disadvantages of WEKA over Excel.................................13 CONCLUSION..............................................................................................................................14 REFERENCES..............................................................................................................................15
INTRODUCTION Data handling is a procedure of recording, analysing, interpreting and evaluating the data set.The concept of data handling is related with business intelligence which helps in utilising the tools of data handling in operations of a business organisation(Bordeleau, Mosconi and Santa-Eulalia, 2018).The main aim of this report is to build an understanding about how data systems work and how their benefits can be optimised. This report is divided into two sections. In first section of this report, current trends in data warehousing, business intelligence and data mining are discussed. Along with which, a data set of superstore is used in order to practically use the Microsoft Excel functions. These functions will be used to determine the decline in sales and profit of superstore. In the second section of this report, data of audidealership is used for clustering the data using an analytic application of WEKA. In this section, various data mining methods are also analysed along with their associated real life examples. Finally advantages and disadvantages of WEKA application are discussed over Microsoft Excel. PART 1 Evaluating the use of Excel for pre-processing, analysing and visualising the data using Superstore data and determining the decline in sales/profits over the years Data warehousingis the practice of storing the data so that it can be used whenever it is required. This concept is used by various business organisations to store and organise their business information so that it can protected. The current trends for data warehousing in market are optimisation and performance by creating a balance between disk storage and memory. Another trend in this field is In-memory database management system. This trend is result of bridging the issue of slow query response system(Mitrovic, 2020). Business intelligenceis a system which allows its users to use tools and techniques by which simple data can provide meaningful insights and information which can be used for business operations. Current trends for business intelligence in market are open source BI and software as a service. By using open source BI, small scale organisations can use tools of BI without any license acquisition and free of costs. On the other hand, SaaS is the result of increasing demand of software for forecasting and predictions. 1
Data miningis the procedure of mining the pre stored data by running algorithms and other functions to mine the additional information from the data which is not visible by only looking at the data set. The current trends which are observed in the field of data mining are Visualisation and service oriented architecture(Fuchs, Höpken and Lexhagen, 2014). Visualisation is the trend whichisbeinghighlyusedbybusinessorganisationasitallowspresentingthemined information by using graphs, charts and tables. On the other hand, service oriented architecture helps in mining only that data which can result in providing useful insights to the company. There are various software applications which are used to conduct above procedures of data warehousing, business intelligence and data mining. One of those applications is Microsoft Excel, using this application; its usage for pre processing, analysing and virtualising the data is evaluated. Pre processing the data The data in superstore dataset is raw and is required to be pre processed before analysing. Excel is analytic software which has various options to pre process the data. Among all those tools and functions, the function of identification of missing values, PIVOT table, FILTER and SORT are used. By using the shortcut key of Shift + F4, all the missing values (product base margin) in dataset are identified and then filled by their mean values. Once the dataset is cleansed, it is then transformed by using sorting and filters option so that all values can be shown according to their order date. Lastly, the dataset is reduced to only few values using PIVOT table. The selection of these variables is done on the discretion that only those variables will be selected which can impact organisation’s profit and sales. Analysing the visualisation Once the dataset is processed, it is then essential to analyse and visualise it so that meaningful insights can be gained. This process of analysis is done using the Excel functions of SUM(), LOOKUP(), COUNTIF() and CORRELATION. And the process of visualisation is done by Bar chart and line graph(Ataman, Kulick and Sim, 2011). First all the numerical values are selected from the PIVOT table and using the function of sum, total of all numerical values are divided according to four years (2009, 2010, 2011 and 2012). A visual representation of these total values is given in below table. Using the data of the table, graphs for each variable is also prepared. 2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Sum of ProfitSum of SalesSum of Discount Sum of Shipping Cost Sum of Unit Price Sum of Order Quantity Sum of product base price 2009434096.024209896.85105.3928481.76232830.98545081113.69 2010364917.333560087.04105.8127354.26162467.59543791099.9 2011380310.53429944.98101.6724939.85159653.11514131019.93 2012342444.133715671.95104.3227055.17195467.55544801071.08 3
From the above analysis and visuaisation, it has been interpreted that each variable which is analysed above is the resaon behind the decline of sales as all the variabes are showing the similar trend that is minimum value in 2011 year. Now the most impactful variable upon sales is determine using the Excel function of CORRELATION an dresults gained from this function are given below: 4
Sum of Sales Sum of Discount Sum of Shipping Cost Sum of Unit Price Sum of Order Quantity Sum of product base price Sum of Sales1 Sum of Discount0.5540200741 Sum of Shipping Cost0.8375090530.916913191 Sum of Unit Price0.9765453470.44458840.7640935441 Sum of Order Quantity0.6078583430.933976430.9167144380.572287361 Sum of product base price0.7282721940.969972860.9760065330.617645760.903226421 This is clear from above results that unit price is the main reason behind the decline in sales. Superstore has reduced the unit price of thei products due to which sales revenue of this organisation has been reduced. The above analysis has only provided the reason behind decline in sales. In order to identify the cause behind decline of profit, non numeric variables are used by using LOOKUP and COUNTIF and the results are attached below: Delivery TruckRegular AirExpress Air 20093071582269 20102981597246 20112631460275 20122911609202 Shipment mode North CarolinaOntarioNorthwest TerritoriesAtlanticWestPrarieQuebecYukon 20091655116283507464188133 20101857516295483412200142 20112651719240460412191133 2012195188262541418202134 Region 5
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Small BusinessConsumerCorporateHome Office 2009416443790509 2010434433764510 2011389396718495 2012403377804518 Customer segment 2009201020112012 0 100 200 300 400 500 600 700 800 900 Small Business Consumer Corporate Home Office Office SuppliesTechnologyFurniture 20091169541448 20101170531440 20111112468418 20121159525418 Product category 6
2009201020112012 0 200 400 600 800 1000 1200 1400 Office Supplies Technology Furniture Sum of ProfitExpress Air Northwest TerritoriesConsumerFurniture Sum of Profit1 Express Air0.748970561 Northwest Territories0.550053120.958126291 Consumer0.735930260.5802606410.531731671 Furniture0.705463290.3839573970.298999930.965259641 From the above analysis, four reasons behind the decline of profit are identified which are decline in the shipments using Express air, reduction of sales in Northwest territory, reduction in sales to consumer and reduction of sales of furniture. The most influential variable among these four is reduction in shipment through Express air and this result is gained from the correlation co efficient. Determining decline in sales and profit The reason behind the reduction of sales is declining unit price of the superstore’s products and the reason behind low profit is the reduction in usage of “Express air” as a shipment mode. PART 2 2.1 Workings of WEKA using audidealership.csv WEKA is analytic application which helps its users in the process of data mining. There are various functions which can be conducted using WEKA and one of them is clustering (Yuchen, 2016).For the given data set of audidealership, the method of clustering which is used is K means clustering which requires to pre determine the numbers of clusters and as the data has two data points such as 0 and 1, two clusters are determined. The results of WEKA are shown below: 7
=== Run information === Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -S 10 Relation:audidealership2 Instances:100 Attributes:8 Dealership Showroom InternetSearch RS7 A4 TT Financing Purchase Test mode:evaluate on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 160.2980769230769 Missing values globally replaced with mean/mode Cluster centroids: Cluster# AttributeFull Data01 8
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
(100)(48)(52) ================================================= Dealership0.540.33330.7308 Showroom0.640.66670.6154 InternetSearch0.390.43750.3462 RS70.530.29170.75 A40.550.81250.3077 TT0.50.58330.4231 Financing0.60.33330.8462 Purchase0.380.04170.6923 Time taken to build model (full training data) : 0.02 seconds === Model and evaluation on training set === Clustered Instances 048 ( 48%) 152 ( 52%) 9
From the above results, it is clear that number of instances of 0 cluster is 48% and number of 1 cluster is 52%; this implies that more than half person completes the steps of walking inside the showrrom, looking over three versions of cars, undersatnding financing options and buy a car. From the above results, few patterns are identified on of which is the model A4 is most looked over car of the showroom which must be considered as perefernce of consumers. Another identfied trend is that out of 100 people 60 understand their finnacing options but only 38 really buy the car. This trend provides the opportunity to effectively enahnce the financing options so that sales of this delearship can grow. 2.2 Explaining data mining methods that can be used in business with real world examples As analysed in above sections data mining is the procedure by which a user can mine the data to get more useful insights from the information. There are various methods by which this method is conducted in business organisation. An analysis of few of these methods along with their application with real life examples are explained as follows: 10
Tracking sequential patterns – This method is a technique which is used by focusing on identifying the series of events which happen in data set at regular intervals. These regular intervals are the sequences which help in mining the transactional data. This type of method is considered as ideal for large organisation or for organisation’s having big data. A real instance in which this method can be used is retail organisations. These organisations track sales of their every product to identify that in which month they are sold at maximum level and then they develop their strategies accordingly(Llave, 2018). Decision trees – This method is a machine learning technique which helps its users to mine data by identifying how data inputs affects data outputs. Decision tree acts as a model in which a tree like structure is developed with set of braches. This model is structured in a such a way that each input unit presents its affected output units. There are various issues with this model due to prediction and forecasting of variables using this technique is complex and there is high rate of inaccuracy. A real life practical example of this model is research department of a travel company in which they predict their demand by analysing the impact of their offers upon their sales. Clustering – This is an analytics technique in which entire data set is divided into groups which are considered as clusters. These clusters are represented using visual graphs in which distribution of data is represented. The main of this technique is to identify the trends in each cluster. A real life example of this technique is automobile showrooms in which activities of all their customers are tracked and then divided into clusters so that their preference and buying patterns can be considered(Shollo and Galliers, 2016). Correlation – This is one of the most used techniques of data mining in which relationship between two variables are analysed along with the nature of their relationship. A practical example of this technique is investigator who analyse the impact of various factors on the gross domestic product of a nation. In this example, all the factors are correlated with GDP and then most impacted factor with high correlation is identified(Vagh, 2012). All the techniques which are explained above are used in real life practices. Some of these techniques such as clustering and tracking patterns are even used in audidealership data. 11
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
2.3 Discussing the advantages and disadvantages of WEKA over Excel WEKA or Waikato Environment for Knowledge Analysis is analytical software used for data mining. This is free software which does not require any license and is a tool of practical machine learning. Pros and cons of WEKA over Microsoft Excel are discussed as follows: Pros: Compatibility – WEKA is software which runs on Java due to which it is compatible with all the operating systems which supports Java such as Mac, Windows, Linus and more. On the other hand, Microsoft Excel is only compatible with Windows operating system. This implies that the dataset of audidealership could be analysed using any platform but for superstore data, only Windows usage is required. Open source – Another positive point of WEKA is that it is open source software and uses General Public License. This means this application can be used free of cost for any non specified purpose. On the contrary, in order to use Microsoft Excel, it is important for user to acquire license of entire Microsoft package first(Zubi and Mahmmud, 2013). Easier to use – As the functions and tools which WEKA application offer are limited, it is considered easier to use. On the other hand, to use advance functions of Excel requires specific skills and abilities. This point is even evident with the practical usage of Excel and WEKA which has been done to analyse two datasets. The usage of WEKA for the purpose of clustering was easier from usage of Excel functions such as IF, LOOKUP and PIVOT table. Cons: Like every software, WEKA also has few cons which make Excel more suitable data mining software. Memory sensitive – As it has been discussed above, WEKA runs on Java. This feature of this application makes it memory sensitive due to which it can store information to an extend only. On the other hand, Microsoft Excel can store big data of large companies due to which it is considered ideal for large data analytics. Lack of documentation – WEKA is open source software and lacks in proper documentation due to which large organisations do not trust upon this software and prefer to use applications with a trustworthy documentation. On the other hand, every software application of Excel comes with a license and documents which gains the trust of business organisations which wants to keep their data secure(Phanse and Deorah, 2011). 12
From the above discussion, it has been seen that Microsoft Excel is a software application which is appropriate for big data due to which it was used for superstore data having 8399 queries. On the other hand, WEKA is a quicker and easier application to use which is appropriate for small data due to which this application was used for audidealership data having just 100 queries. CONCLUSION From the above analysis, it has been concluded that data handling and business intelligence are the platformsusing whichbusinessorganisationsuse theirdata for the purposesof predictions and decision making. It has also analysed that every trend in data warehousing, BI and data mining is the result of current issues which organisations are faced and these trends tend to resolve those issues. From the dataset of superstore it has been concluded that decline in sales was the result of decline in “unit price” and decline in profit was the result of “shipment mode”. The dataset of audidealership summarises that this organisation must focus upon promoting their A4 version of car. 13
REFERENCES Books and Journals Shollo,A.andGalliers,R.D.,2016.Towardsanunderstandingoftheroleofbusiness intelligencesystemsinorganisationalknowing.InformationSystemsJournal.26(4). pp.339-367. Bordeleau, F. E., Mosconi, E. and Santa-Eulalia, L. A., 2018, January. Business Intelligence in Industry 4.0: State of the art and research opportunities. InProceedings of the 51st Hawaii International Conference on System Sciences. Mitrovic, S., 2020. Adapting of international practices of using business-intelligence to the economic analysis in Russia. InDigital Transformation of the Economy: Challenges, Trends and New Opportunities(pp. 129-139). Springer, Cham. Fuchs, M., Höpken, W. and Lexhagen, M., 2014. Big data analytics for knowledge generation in tourismdestinations–AcasefromSweden.JournalofDestinationMarketing& Management.3(4). pp.198-209. Llave, M. R., 2018. Data lakes in business intelligence: reporting from the trenches.Procedia computer science.138. pp.516-524. Yuchen, W.U., 2016. The Data Mining Application Based on WEKA: Geographical Original of Music.Management Science and Engineering.10(4). pp.47-57. Vagh, Y., 2012. The application of a visual data mining framework to determine soil, climate and land use relationships.Procedia Engineering.32. pp.299-306. Zubi, Z.S. and Mahmmud, A.A., 2013. Using data mining techniques to analyze crime patterns in the libyan national crime data. InProceedings of the 1st WSEAS Interna-tional Conference on Image Processing and Pattern Recognition(pp. 79-85). Phanse, V. and Deorah, S., 2011, December. Evaluation and Extension to the Duckworth Lewis Method:ADualApplicationofDataMiningTechniques.In2011IEEE11th International Conference on Data Mining Workshops(pp. 763-770). IEEE. Ataman, K., Kulick, G. and Sim, T., 2011. Teaching decision tree classification using Microsoft Excel.INFORMS Transactions on Education.11(3). pp.123-131. 14