TEXT MINING Text mining is the process of mining or deriving information of high quality from text (Cousssement Kristof, et al., 2008). Such information is arrived at by devising trends and patterns through means like statistical pattern learning. In text mining, high quality refers to a combination of novelty or relevance. Text mining tasks include sentiment analysis and document summarization. Data mining however is the process of obtaining patterns in data sets(Han, et al., 2011). Overall goal of thisis to extract relevant information from a dataset and transform it into a structure that is understandable. Data mining is a general process that includes text mining. However, data mining is the extraction of relationships from structured data e.g., pie charts or tables whereas text mining involves extraction of insights from a dataset. Text retrieval processes range from a preparatory step to the final step of compiling the high- quality information. Information retrieval involves identifying and collecting a corpus in a database for analysis (Pang Bo & Lee Lilian, 2002). Text analytic systems apply natural language processing e.g., syntactic parsing and other forms of linguistic analysis. Named entity recognition involves statistical techniques, gazetteers etc. for identifying named text features; abbreviations and stock ticker symbols(Quinlan, 1996). Disambiguation involves use of clues that are contextual. Recognition of Pattern Entities involves features like email addresses, telephone numbers, which are discerned through regular expression or other kinds of pattern matching. Text mining also involves co-referencing, the identification of terms that refer to a similar object. This is followed by Fact, Relationship or Event Extraction through which associations among entities in a text is identified. Sentiment analysis encompasses discernment of subjective material and extraction of various levels of attitudinal information: mood, sentiment etc.(M. Kuhn & K. Johnson, 2013). Quantitative text analysis involves techniques like social sciences, where either a computer or a human judge/analyst extracts semantic relationships between text-data so as to find out the stylistic patterns, meaning of a casual text that may be personal for the aim of psychological profiling(Quinlan, 1996).
Text mining is applied for a variety of research, business or government needs(S. B. Kotsiantis, 2007). Application categories include E-discovery and Enterprise Business Management.
ARTIFICIAL INTELLIGENCE 1.Why artificial Intelligence is important for supporting business to build smart systems. Artificial intelligence is slowly changing the course of engineering in this century. Accuracy and precision of artificial intelligence based systems has made companies make unimaginable returns on investment(Ford Martin & Colvin Geoff, 2018). Artificial systems are reliable and not subject to bias as in humans hence facts and figures presented using these software is the truth. 2.How artificial intelligence helps to transform companies HANA analyses transaction data for trends. When used with supportive applications, decision making is faster. Operational costs are saved enormously as anomalies in sales that could cause losses is identified and appropriate resolution suggestions made. AIs make data-driven decisions which are better informed. Walmart uses this software in its excess of 11,000 stores to monitor transactions in real time(Mena Jesus, 2011). A report sponsored by SAP gives a revelation, that companies incorporating their cloud platform, HANA, expect an investment return higher than 575%. Domo, another cloud based platform collects data from third-party platforms like Facebook, Square, and uses that data to give insights and provide context to intelligence. By monitoring the performance of the product at sales points, this AI spots rising trends in real-time and generates reports on these trends, offering suggestions on action to take for improvement. Apptus another AI, interestingly learns patterns of customers and enables them to purchase products by automatically giving them suggestions on the products they search. It also recommends actions to be taken to boost sales after analyzing the consumers’ preferences for related products. Working using the same principle of predictive analytics is Avanade which is a joint venture by Accenture and Microsoft leveraging on Cortana Intelligence Suite for data-based insights(Witten Ian H & Frank Eibe, 2011). General Electric’s Predix, working with industrial applications, processes historic performance information of equipment and uses this data to predict operational outcomes i.e., when equipment may fail. It still uses the principle of predictive analytics. By calculating the time an equipment can work before maintenance and giving alerts on maintenance(Pang Bo & Lee Lilian, 2002). Siemens’ MindSphere like Predix monitors fleets of machines for service needs using drive train analytics. Unlike Predix, it can work with machines from any manufacturer.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
3.The thread of artificial intelligence and the limitations of smart systems Current research companies are working on AIs that think like human beings, meaning they can make improvements to their hardware or design hardware for other AIs. This could be the beginning of extinction of the human race. When data collected by machines like Apptus is accessed by other parties, it could sabotage the company’s operations and violate consumers’ privacy.
BANK DATA ANALYSIS USING WEKA (WAIKATO ENVIRONMENT FOR KNOWLEDGE ANALYSIS) SOFTWARE Data mining is an interdisciplinary subfield in computer science(Masys & R, 2001). It is a process (computational) in discoveringpatterns in datasets. Analysis type used for the dataset is the decision tree j48 classifier. It is the Weka implementation of C4.5 algorithms, successor of ID3. Weka gives a visual representation of the generated decision tree from the j48 algorithms. The basic algorithm for the decision tree is: Beginning with the whole training data, which gives the maximum possible accuracy for the data-set analysis. Selecting attributes or values along the dimensions that will give the best split. Creation of child nodes. Recusing on every child node using the child data until a stopping criterion is reached. This is known when the amount of data arrived at is too small, or when the class reached is the same, or if the tree has become too large. A decision tree is a recursive partition of instance space(M. Kuhn & K. Johnson, 2013). It consists of nodes forming a rooted tree, meaning that it’s a directed tree having nodes called “roots” which have no incoming edges. All other nodes only have one incoming node. Nodes with outgoing edges are called internal nodes. All the other nodes are leaves, usually referred to as decision nodes or terminals. Each internal node splits instant space into sub-spaces, usually two or more, according to a specified discrete function of input attributes values. The main challenge when building a decision tree comes on the decision on the attribute on which to split data at a particular stage so as to get the best split. Information gain is used, which is the difference between entropies before, and after the decision has been made. Entropy is the measure of uncertainty that arises from a decision or some information. So, the attribute with the highest amount of information gain is picked. In our case, it is the number of children. The algorithms go ahead to build a fairly accurate decision tree by evaluating the attributes that remain under the initial attribute (Number of children). Whenever a number such as YES 124.0/3.0 appears, it means that the data has been correctly classified 124 times and wrongly classified 3 times, or has a RIGHT to WRONG classification ratio of 124.0/3.0.
Ten-fold cross validation with 15 minimum number of objectsgives a fairly accurate and realistic tree(M. Kuhn & K. Johnson, 2013). The training set will give a more accurate analysis as it analyses the data based on all of the attributes. However, the tree in such a case is too long and hard to follow through. Figure1Attributes and Decision Tree
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Figure2Continuation of decision tree Figure3Confusion matrix, continuation of decision tree The confusion matrix gives the number of times an evaluation was done correctly, in this case 245 times, while 29 is the number of times an error is made.
Figure4Decision tree visualization The above decision tree generated has children as the initial attribute(Ramiro H Galvez & Agustin Gravano, 2017). If thenumber of children is less or equal to 1, it splits the attribute into two, married and income, based on low entropy; married when number of children is equal to or less than 0, and income when number of children is greater than 0 but less than 1. Income attribute is stopped at the comparison of the amount, where income less than or equal to 15538.8 has a RIGHT to WRONG classification ratio of 24.0/4.0, and income greater than 15538.8 has a RIGHT to WRONG classification ratio of 111.0/5.0. Under married attribute, if the return is a YES, then save_act attribute is chosen. If the return is a NO, mortgage attribute is chosen. The save_act attribute is further split into having savings or not. In the case of a YES return, there is a WRONG to RIGHT classification ratio of 119.0/12.0. If a NO return is the case, then mortgage attribute is chosen, which is split into whether one has a mortgage or not. The NO return has a WRONG to RIGHT classification ratio of 36.0/5.0 and a RIGHT to WRONG classification ratio of 25.0/3.0. Under mortgage attribute, which a result of the split of married attribute, NO has a RIGHT to WRONG classification ratio of 48.0/3.0. If one has mortgage, in which case the return is a YES, the lowest entropy attribute chosen is age. Age of less than or equal to 40 has a RIGHT to WRONG classification ratio of 19.0/8.0 while that greater than 40 has a WRONG to RIGHT classification ratio of 16.0/1.0. Back to the initial attribute, for those with a number children greater than 1, the next attribute is income. Income less or equal to 30404.3 has a WRONG to RIGHT classification ratio of
124.0/12.0. Income greater than 30404.3 is further split into whether the number of children is less than or equal to 2, or greater than 2. For the latter case, the RIGHT to WRONG classification ratio is 51.0/5.0 while the former case has a NO to YES ratio of27.0/10.0. CONCLUSION Individuals with no children more likely have an income of over 15538.8. Some of the married individuals have savings while those with no savings have no mortgage. Those who are not married are not likely to have mortgage, and when evaluated considering age of 40; greater, less or same as 40 years, those at the age of 40 or less than 40 have mortgages. Individuals with children have an income of more than 30404.3, most of them having at most 2 children.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
BUSINESS PERFORMANCE DASHBOARD Figure5Financial dashboard A dashboard is a visual representation of important information required to achieve an objective; arranged and consolidated on one screen so that the information it contains can be easily monitored(Dona M Wong, 2006). The above dashboards represent the performance of the business in terms of production, sales projection and actual sales made. Production and sales statistics are recorded quarterly for the 5
products using bar graphs. Estimated sales are compared to the actual sales using sales curves. Regional performance of the products is also captured on a pie chart. The estimated and actual sales curves illustrate the good performance of the business as most of the target sales are made and exceeded. It’s only product 2 that falls short of its projected sales target. All the graphs show the estimated and actual production (first graph) and sales (second and third graphs)(Steven Faw, 2006). The dashboard above is for analytical purposes(Steven Faw, 2006). It has greater context in terms of subtle performance evaluators, rich comparisons and extensive history. It is simple in design and easy to understand and can also be used by savvy technical users like researchers and data analysts. However, the preoccupation with functionally and superficially distracting visual features on the dashboard has undermined its usefulness(Tufte, 2003). Color has been grossly misused on the production dashboard, it is too bright. There is excessive precision on the data represented. Too much precision or showing exact values as they are, slows the viewer down with little benefit. For example, using figures like $1,670,824 instead of $1.6M(Dona M Wong, 2006). For comparison purposes, it would have been better if a percentage deviation is given, between the projected value of sales and the actual sales made(Tufte, 2003). This is not provided for in the dashboard example above. References
Dona M Wong, 2006.The Wall Street Journal Guide to Information Graphics.1st ed. Arkansas: s.n. Eckerson, W. W., 2010.Performance Dashboards: Measuring, Monitoring and Managing your Business. s.l.:Wiley. Fayyad, Piatetsky-Shapiro & Padhraic, 1996.From Data Mining to Knowledge Discovery in Database. s.l.:Smyth and Co.. Ford Martin, F. & Colvin Geoff, 2018.Will robots create more jobs than they destroy.s.l.:s.n. Han, Kamber, Pei & Jaiwei, 2011.Data Mining: Concepts and techniques.s.l.:s.n. M. Kuhn & K. Johnson, 2013.Applied Predictive Modelling.s.l.:Springer. Masys & R, D., 2001. Linking microarray data to the literature.Nature Genetics,pp. 9-10. Mena Jesus, 2011.Machine Learning Forensics for Law Enforcement, Security and Intelligence.Boca Raton: CRC Press (Taylor and Francis Group). Pang Bo & Lee Lilian, 2002. Proceedings of the ACL-02 conference on empirical methods in natural language processing. pp. 79-86. Quinlan J. R., 1993.Programs for Machine Learning.s.l.:Morgan Kaufmann Publishers . Quinlan, J. R., 1996. Improved use of continuous attributes in C4.5.Journal of Artificial Intelligence Research,pp. 77-90. Ramiro H Galvez & Agustin Gravano, 2017. Assessing the usefulness of online message board mining in automatic stock prediction systems.Journal of Computational Sciences,pp. 1877-7503. S. B. Kotsiantis, 2007. Supervised Machine Learning: A review of classification techniques.Informatica, pp. 249-268. Steven Faw, 2006.Information Dashboard Design: The effective Visual Communication of Data. s.l.:O'Reilly. Tufte, E., 2003.Envisioning Information, The Visual Display of Quantitative Information, and Beautiful Evidence.2nd ed. New york: s.n. Witten Ian H & Frank Eibe, 2011.Data Mining: Practical Machine Learning Tools and Techniques. Waikato: Elsevier.