Contents INTRODUCTION...........................................................................................................................3 PART 1............................................................................................................................................3 Identify the sales and profit over years, evaluate the use of Excel for pre-processing the data or information..................................................................................................................................3 Demonstrate that how can practical ways to perform operation by using Excel function such as Pivot table, if, Lookup, graph and chart......................................................................................6 PART 2..........................................................................................................................................11 Audileadership data in the conjunction by using Weka software and perform clustering........11 Describe about the data mining methods that can be used within a business............................14 Advantage and disadvantage of Weka over Excel....................................................................15 CONCLUSION..............................................................................................................................17 REFERENCES..............................................................................................................................18 2
INTRODUCTION Data Mining is consider as a process of identifying the different patterns, interrelation between large volume data or information. this process is mainly used the large organization where every day collecting large information within system. It will support for filtering data on the basis of categorising. Marketing assistant will participate in the business expansion so that they will use data mining software to gather relevant information or data. In order to cut the cost / price while improving the customer relationship. Moreover, it will minimise the various type of risk, threat in the organization. Data mining is an important factor for exploring and analysing the large amount of data. It provide the facilities to discover the meaningful pattern, facts, and figures. The documentation will describe about the sales information and also calculating the profit, sales over years. The primary vision is to predict the future outcome or result through data mining concept. In additional, data mining is a type of appropriate technique which help for building the machine learning model in term of artificial intelligence. PART 1 Identify the sales and profit over years, evaluate the use of Excel for pre-processing the data or information. Row LabelsAverage of ProfitSum of Sales Furniture68.116606735178590.542 2009140.13699551469508.194 201020.653914031250043.046 2011115.3262261258336.514 2012-5.1733571431200702.788 Office Supplies112.36907383752762.1 2009153.42853811031244.56 201097.14263473885095.79 201180.42802855816902.13 2012117.64474231019519.62 Technology429.20751575984248.182 2009337.01259741668572.052 2010474.51304021416503.546 2011518.21621051380213.417 2012398.37255681518959.168 3
Grand Total181.184424314915600.82 Table:1 Figure1 Figure2 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Figure3 Calculate the sum of profit and sum of sales through Excel Row Labels Sumof SalesSum of Profit Furniture5178590.542117433.03 Atlantic708726.78215345.65 North Carolina43545.6143478.88 Northwest Territories31451.1925057.79 Ontario1361004.21622280.36 Prarie919191.12630551.22 Quebec605784.144-760.77 West1172989.39430924.64 Yukon335898.07410555.26 Office Supplies3752762.1518021.43 Atlantic478464.4266970.47 North Carolina38615.47-3124.02 Northwest Territories21955.031317.97 Ontario1122325.15188888.85 Prarie720090.4383259.7 Quebec351822.6842982.17 West797510.76116666.85 Yukon221978.1621059.44 Technology5984248.182886313.52 Atlantic827057.0015156644.54 North Carolina34215.39952486.25 Northwest Territories30411.5241931.29 Ontario1296912.697228045.36 Prarie1198023.046207349.2 5
Quebec552588.25698205.25 West1627049.122149417.12 Yukon417991.13742234.51 Grand Total14915600.821521767.98 Table: 2 Figure4 Demonstrate that how can practical ways to perform operation by using Excel function such as Pivot table, if, Lookup, graph and chart. Pivot table: it is based on the statistics that mainly summarised large amount of data which become more extensive table. It may include averages, sums and other type of statistical information. Pivot table is consider as technique which mainly used for data processing. There are large number of statistical data used to draw attention towards useful information (Aufaure and et.al., 2016). A pivot table summarised the data by using tool and processing to reorganise, count, group and average data stored within database. It allows for user transform column into rows. 6
Lookup: this function is basically used to categorise under excel and reference functions. It can be performed the rough match lookup either in a one column range and return the corresponding values. Calculate the Sum of shipping cost, sum or product base margin and sum of sales for Furniture. Row Labels Sum of Shipping Cost Sum of Product Base Margin Sum of Sales Furniture53243.691006.775178590.542 Bookcases8646.07122.09822652.04 Chairs & Chairmats15512.69228.461761836.55 Office Furnishings8402.72414.31698093.81 Tables20682.21241.911896008.142 Figure5 Estimate the actual Sum of shipping cost, sum or product base margin and sum of sales for office supplies. Office Supplies36095.512116.77 Appliances6854.11240.64 Binders and Binder Accessories6633.52342.42 Envelopes1682.7791.97 Labels288.66108.61 Paper7914.41458.85 Pens & Art Supplies2041.81337.76 7
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Rubber Bands225.2195.14 Scissors, Rulers and Trimmers670.5192.23 Storage & Organization9784.51349.15 Figure6 Calculate the Sum of shipping cost, sum or product base margin and sum of sales for office supplies. Technology18491.841148.775984248.182 Computer Peripherals4067.34449.67795875.94 Copiers and Fax2446.8836.371130361.3 Office Machines7135.91149.752168697.14 Telephones and Communication4841.71512.981889313.802 Figure7 8
Calculate Sum of region By Profit Row Labels Sum of Profit Atlantic238960.66 North Carolina2841.11 Northwest Territories8307.05 Ontario439214.57 Prarie321160.12 Quebec140426.65 West297008.61 Yukon73849.21 Grand Total1521767.98 Figure8 Date wise count customer segments Row Labels Count of Customer Segment 13/01/20094 13/01/20108 13/01/20118 13/01/201212 13/02/20096 13/02/20106 13/02/20115 9
Describe about the data mining methods that can be used within a business Data mining is based on the technique that utilise to refine data analysis tool and find out the previously unknown information, valid patterns and relationship in huge data sets. This data mining tool can corporate statistical models, machine learning technique and mathematical algorithms such as neural networks, decision tress (Bordeleau, Mosconi and Santa-Eulalia, 2018). It should be considered the different data mining technique that can help for business for growth and development. Classification Analysis: it is based on the data mining method which mainly used to identify or distinguish between different items. In order to classify and group with different category. It always providing the help for predicting behaviour of item within specific group. This technique can be completed into different steps: initially, it can use learning step in which providing training set for purpose of analysed. In another step is that when classification different process while estimating the rules (Jalil and Hwang, 2019). For Example- Banking sector used this method for classification and identifying the loan applicant who have low, medium and high credit risk. Clustering analysis: It is to be consider as classification but also differentiate in different manner. Clusters are generally made of dependence or similarities of data items. It can be divided into different clusters that have unrelated and dissimilar objective. In most of cause, it called as data segmentation because it help for partitioning huge amount of data sets into different clusters. Theclustering methodsthatare basicallyused by organizationas per requirements. For Example- if bank want to cluster with high credit risk, filtering on the basis of salary, age. In order to handle and control the data in proper manner. Prediction: it is based on the method that mainly used by organization to predict future on the basis of present, past trends. Prediction is an essential for business to gather information with the help of combination of other mining method such as relation, pattern matching, trend analysis and classification (Mitrovic, 2020). In some situation, prediction method is commonly used by supermarket because they can try to estimate the future growth and development. For Example- Supermarket is mainly used to predict the overall revenue of business where every item can generate on the basis if previous sales report. Sequential pattern and tracking: it is also common method that mainly used for purpose of data mining where organization use for identifying various pattern in order to complete task over 14
certain time intervals. Many retail enterprise uses this pattern for increasing demand of product and service in global marketplace. It can be possible when potential customer can easily track sensitive data or information in proper manner.For Example- Retail firms use this method to calculate the maximum sales of product within specific time intervals. In order to increase the demand of good and service in global marketplace. Pattern tracking will recognise opinion of potential customer related particular product. Decision Tree: it is based on the data mining technique which is mainly applicable in the organization to classify their item so as need improve their decision related business growth and development (Ogudo and Nestor, 2018). In some situation, Government enterprises use decision tree technique for eliminating issue or problem. It provide the facility to identify individual who are under 18 so that they can issue licence. In this method, it can provide direction to classify the citizen in different age groups. Outlier analysis: this type of method is basically used by companies for identification of data items. They do not comply with the different patterns and expected behaviours. As per identified the unexpected data which is known as noise. The technique is basically used by companies for different purpose such as banking sector to determine the fraud detection, intrusion detection. These are considered as common approach which help for identifying unexpected data items. Neural network: it is a process which completely based on the neural network. It can be established the relationship between input as well as output. Many companies are using neural network for recognition within input or output in proper manner. It is to be consider as important method for classifying the data or information. Advantage and disadvantage of Weka over Excel. Weka is an open source data mining software. It does not only support machine learning algorithms, but also data preparation and meta-learners like bagging and boosting. Complete suite is written in java, so it can run on any platform (Park, El Sawy and Fiss, 2017). The package has three different interfaces:a command line interface, an Explorer GUI interface which allows you to try out different preparation, transformation and modeling algorithms on a dataset and an Experimenter GUI interface which allows to run different algorithms in batch and to compare the results. 15
The functionalities of Weka more or less boil down to the algorithms described in Witten and Frank’s data mining book. An overview of the Weka functionalities: SVM’s: only polynomial kernels are supported. Also, support vector regression is not supported. Decision trees: ID3 and C4.5 are implemented, and M5’: a model tree induction algorithm for predicting numeric values (each leaf node has a regression model). PART is a rule-learner that makes rules by building different decision trees and each time keeping the leaf with the largest coverage. Memory based methods: kNN and locally weighted regression. Neural Networks: only backpropogation with momentum is supported. Simpler methods: naive Bayes (for numeric values, a normal distribution is used, but also ‘kernel density estimation’ can be used to avoid assuming a normal distribution) and linear regression are useful simple methods. Advantages: Weka data mining can truly aid an enterprise attain its fullest prospective. It is an approach to evaluate how business is becoming impacted by particular qualities, and may assist company entrepreneur improve their earnings and steer clear of generating company mistakes down the line. Fundamentally, through this process, a company is analyzing specific information from distinct perspective to be able to obtain a total rounded watch of how their business is performing (Villamarín and Diaz Pinzon, 2017). Enterprise proprietors can get a broad point of view on points these as client trending, where they may be shedding cash and where they are cheating cash. The knowledge may also reveal methods that may help a business lower unneeded fees and may aid them boost their overall income. The obvious advantage of a package like Weka is that a whole range of data preparation, feature selection and data mining algorithms are integrated. This means that only one data format is needed, and trying out and comparing different approaches becomes really easy. The package also comes with a GUI, which should make it easier to use. Another advantage of Weka can be that it is constantly under development and not only by its original designers. People have already been utilizing weka data mining for several years in different formats (Wani and Jabin,2018). Only since the technological innovation is now obtainable has data software program been utilized. But there happen to be numerous technique 16
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
inside the past for organizations to evaluate their data and utilize it to their advantage. However it cannot be denied that the accessibility to better technology has significantly improved the ability to store or gather info, make predictions about outcomes and rehearse client trend reports to greater positive aspects. Disadvantages: Probably the most important disadvantage of data mining suite like this is that they don’t implement the newest technique. For example the MLP implemented has a very basic training algorithms (backdrop with momentum), and the SVM only uses polynomial kernels, and does not support numeric estimation. Therefore, it will be necessary to combine WEKA with some of the other tools like Netlab or SVM_torch. One more important disadvantage arise from the fact that software is free: the documentation for the GUI is quite limited (Wani and Jabin,2018). The software is constantly growing day by day but the documentation is not up to date with everything either (the most up to date and complete information about algorithm options can be obtained using the -h option in the command line interface). Another possible problem is scaling. For more complex tasks on large datasets, the running time can become quite long, and java sometimes gives an OutOfMemory error, but this problem can be reduced by using the ‘-mxx’ option when calling java, where x is memory size. For large database it will always be necessary to reduce the size to be able to work within reasonable time limits. Another problem or disadvantage is that GUI does not implement all the possible options. Things that could be very useful, like scoring of a test set, are not provided in the GUI, but can be called from the command line interface (Wani and Jabin,2018). So, sometimes it will be necessary to switch between GUI and command line. Lastly, the data preparation and visualization techniques offered might not be enough. Most of them are very useful, but in most of the data mining tasks it will be need more to get to know the data well and to get it in the right format. Another disadvantage of Weka can be that its performance is often sacrificed in favor of portability, design, transparency, etc. 17
CONCLUSION Fromabovediscussion,itconcludedthatDataMiningisbasedontheprocessof determining the different patterns, interrelation between large volume data or information. The mining process help for large organization where every day collecting large information within system. It will support for filtering data on the basis of categorising. In another way, Data mining is an important factor for exploring and analysing the large amount of data. It provide the facilities to discover the meaningful pattern, facts, and figures. It has summarised about the sales information and also calculating the profit, sales over years. Therefore, it can easily predict the future outcome or result through data mining concept. In additional, data mining is a type of appropriate technique which help for building the machine learning model in term of artificial intelligence. 18
REFERENCES Book and Journals Aufaure,M.A.andet.al.,2016.FromBusinessIntelligencetosemanticdatastream management.Future Generation Computer Systems.63. pp.100-107. Bordeleau, F.E., Mosconi, E. and Santa-Eulalia, L.A., 2018, January. Business Intelligence in Industry 4.0: State of the art and research opportunities. InProceedings of the 51st Hawaii International Conference on System Sciences. Jalil, N.A. and Hwang, H.J., 2019. Technological-centric business intelligence: Critical success factors.Int. J. Innov. Creat. Chang. Mitrovic, S., 2020. Adapting of international practices of using business-intelligence to the economic analysis in Russia. InDigital Transformation of the Economy: Challenges, Trends and New Opportunities(pp. 129-139). Springer, Cham. Ogudo, K.A. and Nestor, D.M.J., 2018, August. Modeling of an efficient low cost, tree based dataservicequalitymanagementformobileoperatorsusingin-memorybigdata processing and business intelligence use cases. In2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD)(pp. 1-8). IEEE. Park, Y., El Sawy, O.A. and Fiss, P., 2017. The role of business intelligence and communication technologiesinorganizationalagility:aconfigurationalapproach.Journalofthe association for information systems.18(9). p.1. Villamarín, J.M. and Diaz Pinzon, B., 2017. Key success factors to business intelligence solution implementation.Journal of Intelligence Studies in Business.7(1). pp.48-69. Wani, M.A. and Jabin, S., 2018. Big data: issues, challenges, and techniques in business intelligence. InBig data analytics(pp. 613-628). Springer, Singapore. 19