Big Data Analysis Report for Tree of Life - Data Analytics

Verified

Added on 2022/08/26

AI Summary

This report provides a comprehensive analysis of how the Tree of Life company, a UK-based healthier food producer and supplier, can leverage big data analytics to enhance its competitiveness in a rapidly evolving market. The study emphasizes the importance of understanding consumer insights through text mining of social media data using tools like SAS Text Miner. The report details the feasibility of text mining, outlining the process from data acquisition to implementation, including preprocessing, word segmentation, feature selection, and algorithm-based mining. It also highlights the challenges and key decisions involved in implementing a big data strategy, drawing parallels with successful implementations by companies like MaspexWadowice Group. The analysis covers metrics, variables, and key questions related to text data analysis, providing a detailed overview of how unstructured data can be transformed into actionable insights. Furthermore, it explores various analysis methods, including natural language processing, named entity recognition, and sentiment analysis, to derive meaningful business intelligence and improve customer relationship management, predict market trends, and refine product strategies. The report concludes by emphasizing the potential of big data analytics to revolutionize Tree of Life's operations and improve its market position.

Big Data Analysis Report for Tree of Life 1
Big Data Analysis Report for Tree of Life
Student’s Name
Professor’s Name
Course Name
Date

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Analysis Report for Tree of Life 2
Table of Contents
Analysis...........................................................................................................................................3
Analysis Section..............................................................................................................................3
Metrics, variables and key questions...............................................................................................4
Analysis methods review.................................................................................................................4
Use Cases Discussion......................................................................................................................5
Implementation Section...................................................................................................................5
Steps to follow.............................................................................................................................6
Step 1: Get text.........................................................................................................................6
Step 2: Preprocessing text........................................................................................................6
Step 3: Word Segmentation......................................................................................................6
Step 4: Feature selection...........................................................................................................7
Step 5: Mining with algorithms................................................................................................7
Addressing key challenges..............................................................................................................7
Key decisions to be made................................................................................................................7
Summary..........................................................................................................................................8
References........................................................................................................................................9

Big Data Analysis Report for Tree of Life 3
Executive Summary
Tree of life is one of the companies that deal with production and distribution of food. It operates
in a very competitive environment. To remain competitive, Tree of Life ought to understand its
customers clearly. To achieve this, Tree of Life ought to collect, store and analyze its data. The
best source of data is from social media. The paper proposes text mining as the most
recommended data analytics tool for Tree of Life. The report will justify the feasibility of text
mining for Tree of Life.

Big Data Analysis Report for Tree of Life 4
Overview of Assignment 1
Tree of Life is a UK firm that deals with production and supply of healthier food
products. It operates in a very competitive market. Many companies are entering into the
healthier food industry. It is upon the Tree of Life to come up with competitive ways of doing
things. One way of doing this is by applying the principle of big data analytics. Tree of Life
ought to anticipate and plan based on predictions. The ability to anticipate depends on the
interpretation and correlation of many interconnected data, establishing patterns that allow new
knowledge to be “discovered”. In this sense, it will be possible to predict emerging risks in food
and even detect possible food crises early . And, in the event that they occur, they are able to
coordinate the actions appropriately, thanks to the collection and analysis of all this volume of
data from many different sources.
Currently, Tree of Life has a narrow everyday low price strategy. It is using pricing
model it adopted several years ago. Its promotional system is also very labor intensive. It lacks a
good system that can be used to execute more sophisticated pricing strategies. In the past, many
rules were kept on the paper, others were kept in people’s head. The company also has no
reliable forecasting tools. By adopting big data analytics, the Tree of Life is likely to
revolutionize its operations. In addition, Tree of Life has a problem of retaining its customers.
The company does not understand consumer insights very well. Luckily, there are a lot of
opportunities for Tree of Life it can explore. The current paper aims at identifying key elements
of big data analytics Tree of Life should exploit.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Analysis Report for Tree of Life 5
Analysis Section
The first area where Tree of Life should exploit is text mining. The reason why text
mining is suitable for Tree of Life is that Tree of Life uses social media as part of its marketing
strategies. Intuitively speaking, "text" (words) are words and words, which are written
expressions of a language. They can be sentences, paragraphs, or chapters. The storage method is
different from common numbers. Generally according to different storage methods, data can be
divided into structured data, unstructured data and semi-structured data (Mahgoub, et al 2008).
The so-called structured data refers to information that can be organized and analyzed with a
two-dimensional table. The structure data contain two-dimensional table called folder. Each file
in the folder is clearly labeled and easily identified. Unstructured data is an unstructured mix of
multiple types of information, and its internal structure is usually not directly known. Only after
identification and methodical storage and analysis can its value be realized. Pictures, sounds,
videos, etc. belong to unstructured data (Berry, Kogan, & SIAM International Conference on
Data Mining 2010). Under ideal conditions, all unstructured data can be structured. Unstructured
data is difficult to structure, such as plain text without structured fields (such as article abstracts),
it is difficult to segment and categorize; data between structured data and unstructured data is
called Semi-structured data (Zhong, Li and Wu, 2012). Most text contains both structural fields
such as title, author, and category, as well as unstructured text content. Such texts are semi-
structured data, i.e., semi-structured text data. Fortunately, although text does not have numeric
values, big data analytics have come up with ways in which text could be analyzed and used to
predict the consumers insights. A good application available is SAS Text Miner. The miner is
part of the SAS Business Analytics capabilities used to tap social media data sources. By
applying SAS Text Miner, Tree of Life will be able to analyze the emotional attitudes,

Big Data Analysis Report for Tree of Life 6
evaluations, and product aspects of individual users, which involves the analysis of the emotional
tendency of the individual users. It will also be able to analyze consumer feedback and
sentiments.
The SAS text miner will enable Tree of Life to acquire thousands of text samples. In addition,
with the rapid development of network information technology, the speed of data acquisition,
transmission, and storage has increased significantly. People increasingly rely on the network for
information communication, such as publishing on the Internet. Opinions, discussion issues, and
emotional exchanges make text data appear to be expanding rapidly. In view of the
characteristics of large base and high growth of text data, special methods need to be used to
analyze and process these data to improve the analysis efficiency (Narayana and Kumar 2015).
Metrics, variables and key questions
It should be noted that about "80% of business information comes from unstructured
data, mainly text data". This expression may exaggerate the proportion of text data in business
data, but the information value of text data is beyond doubt. The large amount of text data makes
manual information processing inefficient. Computers must be used to complete related tasks.
However, text data contains complex semantic relationships and emotional tendencies.
Computers cannot directly identify and process text data. Perform the corresponding conversion
treatment. From a narrow perspective, the scientific abstract process of converting text data into
structured data that can be identified and processed by a computer is the text data analysis
process (Cohen and Hersh, 2005). Its primary goal is to use natural language processing and
analysis methods to convert "text" into "data." ", Which specifically involves word frequency

Big Data Analysis Report for Tree of Life 7
distribution research, pattern recognition, association analysis, information extraction,
visualization and predictive analysis, etc. Through text data analysis, the main meaning of the
text and the intention of the text provider can be inferred (Al-Hashemi, 2010). From a broad
perspective, text data analysis not only covers the pre-processing of text data structured as
mentioned earlier, but also includes further research content such as extracting deep and high-
quality information contained in text data, namely text data mining is often described as text
mining. Traditional text data mining includes text classification, text clustering, named entity
recognition, sentiment analysis, and establishment of entity relationship models. The content of
this book starts from the analysis of generalized text data. In addition to introducing structured
processing of text data, it also involves text mining content such as text clustering and named
entity recognition (Gupta and Lehal, 2009).
Analysis methods review
The text analysis entails a series of steps. (1) Obtaining text data. The object of obtaining
text data analysis belongs to the preparation stage of text data analysis. It can be downloaded
from public data sources, or it can use its own data set, or grab it from the network according to
analysis requirements (Ayesha, et al 2010). (2) Natural language processing: Although some text
data analysis involves higher-level statistical methods, some analysis will involve more natural
language processing processes, such as word segmentation, part-of-speech tagging, and syntax
analysis. (3) Named entity recognition: The use of dictionaries or statistical methods to identify
named text features, such as: person names, place names, organizations, specific abbreviations,
etc. (4) Pattern recognition: There may be entities with regular representations such as phone
numbers and email addresses in the text (Samsudin, et al 2013). The process of identifying these
entities through these special representations or other patterns is pattern recognition. (5) Relation

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Analysis Report for Tree of Life 8
identification: identification refers to different words of the same object. (6) Text clustering: use
unsupervised machine learning to classify texts, which is applicable to the analysis of massive
text data. It is widely used in finding text topics and filtering abnormal text materials. (7) Text
classification: that is, in a given classification system Next, a supervised machine learning model
is constructed based on text features to achieve the purpose of identifying the type or content of
the text. (8) Text association: Direct application of traditional association rule mining methods to
text features, including document type association, lexical association, and entity association.
And so on. (9) Sentiment analysis: It includes identifying the implicit subjective content of the
text, and mining different forms of viewpoint information, such as emotions, emotions, moods,
opinions, etc. The current text data analysis technology can be refined to entities, concepts, and
topic levels. emotion analysis. (10) Quantitative text analysis: artificially or through machine
learning to mine semantic and grammatical relationships between vocabulary, and then identify
the meaning and style of a paragraph of text (Gupta and Lehal, 2009).
Use Cases Discussion
A good case of company that has successfully implemented text mining is
MaspexWadowice Group. This company used SAS text miner to analyze online brand (SAS
2018). Using this platform, the company was able to gain a competitive advantage through better
consumer insights, resulting in more effective and efficient marketing efforts. Like Tree of Life,
MaspexWadowice Group deals with healthier food. They share the goals and hence it is a good
case company for Tree of Life to emulate.
Implementation Section
Text data analysis covers many research directions, and these research directions can be
applied to different fields. Starting from different application fields, the text data analysis

Big Data Analysis Report for Tree of Life 9
technology is introduced to play an important role in each field (Zhao, 2013). In business
practice, by analyzing textual data about customers and competitors, companies can improve
their own competitiveness. In terms of customer analysis, companies can obtain relevant text
data from customer relationship data, social media, e-commerce platforms, etc., and use natural
language processing and related analysis to reveal the business information behind the text, and
then conduct product analysis, customer relationship management, Prediction of customer churn,
analysis of enterprise risks and opportunities, summarizing the advantages and disadvantages of
products, grasping customer sentiments and needs, understanding public opinion orientation, and
providing strong support for business decisions and industry trend research (Rajendra and
Saransh, 2013). For example, to understand whether a customer who has not purchased a product
has a positive or negative affection on the product, you can judge to some extent the difficulty of
persuading the customer to purchase the product; analyze the movie trailer reviews to predict the
popularity of the movie , And then quickly adjust the promotion strategy; for the first release of
the product, text data analysis of the complaints that began to appear can quickly identify the
product's problems in order to more quickly and actively avoid the same problems in future
products; analyze competition Product information and evaluation of opponents can timely
understand market demand and trends, and know each other (Calvillo, et al 2013).
Steps to follow
Step 1: Get text
In general, the acquisition of web text is mainly in the form of web pages. We want to get
the text in the network into a text database (data set). Use crawler technology to capture the
information in the network. A crawler can be divided into a theme crawler and a general crawler.

Big Data Analysis Report for Tree of Life 10
The topic crawling mainly crawls or crawls the text of related topics at the relevant site, and the
general crawler generally does not limit this (Joby and Korra, 2015).
Step 2: Preprocessing text
Because there is a lot of unnecessary information in the webpage, such as some
advertisements, navigation bars, html , js code, comments, etc., the next step is to filter the
information in the text, that is, to preprocess the text (Sharda and Henry, 2009).
Step 3: Word Segmentation
After the above steps, you can get relatively clean and usable text information. We know
that key words play a key role in the text, and these keywords determine the orientation of the
text. Here we will use a word segmentation system or word segmentation tool. Common word
segmentation algorithms include maximum matching, optimal matching, mechanical matching,
reverse matching, and two-way matching (Alonso and Contreras, 2016). This algorithm has been
identified by many scientists as the best Chinese word segmentation today, and it supports user-
defined dictionaries and adding dictionaries. The discovery of new words, person names, place
names, etc. also has a good effect.
Step 4: Feature selection
After the above steps, we can basically get some meaningful words. But do all these
words make sense? Obviously this is not the case. Some words will appear in this text in large
numbers, and some will only appear a few times. So which duty is more reasonable for these
relatively meaningful words? There are many different methods for feature selection, but the
improved TF * IDF often has the best effect. tf-idf main idea of the model is: if the word w in a
document has high frequency of occurrence, and rarely appear in other documents, it is

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Analysis Report for Tree of Life 11
considered the word w has a good ability to distinguish, suitable for the articles d and
Differentiate from other articles.
Step 5: Mining with algorithms
After the above steps, the text set can be converted into a matrix. Then use various
algorithms to mine, for example, if you want to classify the text set, we can use KNN algorithm,
Bayesian algorithm, decision tree algorithm, and so on. The results of text mining can show
meaningful charts through various visualization techniques, such as: character diagrams, force-
oriented layout diagrams, chord diagrams, etc. These business-rich, expressive charts can help
people to text Mining structural understanding.
Addressing key challenges
In the real world, knowledge not only appears in the form of structured data in traditional
databases, but also in various forms such as books, research papers, news articles , web pages,
and email . In the face of such a vast array of information sources, human reading ability, time
and energy are often insufficient. Computer intelligent processing technology is needed to help
humans obtain the useful information hidden in these data sources in a timely and convenient
manner. Therefore, the text mining technology was born and developed in this context.
In the era of big data, the future is approaching, and companies generally guess users'
next reactions based on their accumulated historical data and the subjective experience of front-
line operations staff, as a basis for formulating subsequent marketing and operation plans. With
the help of text analysis based on big data, Tree of Life can scientifically analyze user behaviors
and ideas, and transform user insights from original subjective "guessing" into data-driven
accurate predictions. Before the new product is launched, or after it is launched on a small scale,

Big Data Analysis Report for Tree of Life 12
collect the comments of fans and potential users on social media, analyze the text, and know
which aspects of the product they like and are not satisfied with, and They have other
expectations for the product, so that they can respond positively, quickly, and accurately to user
feedback (Sumathy and Chidambaram, 2013).
Key decisions to be made
Through text analysis, Tree of Life would be able to make decision on various aspects
such as consumer ’s overall attitude towards Tree of Life Products. It will also get insights on
emotional attitudes of consumers. This is because text mining will help Tree of Life to analyze
the emotional attitudes, evaluations, and product aspects of individual users, which involves the
analysis of the emotional tendency of the individual users. It will also be able to analyze
consumer feedback and sentiments (Samsudin, et al 2013).
Summary
The paper proposed text mining as a key data analytics that Tree of Life should adopt.
However, to be able to use SAS Text miner, Tree of Life should find ways of collecting
information from social media. The text data is used as the entry point to define the meaning of
text data and summarize the characteristics of text data. Based on this, the meaning, basic
content, meaning and development trend of text data analysis are introduced, so that readers can
quickly and comprehensively understand the text.