ENGT 5214: Data Engineering Approaches for Social Media Analysis

Verified

Added on  2022/08/31

|14
|3418
|21
Report
AI Summary
This report, prepared for ENGT 5214, investigates data engineering methodologies applied to social media content. The research focuses on understanding the effectiveness of different data mining approaches, including keyword-based techniques, statistical and programming-based methods, and computational linguistics with NLP. The study aims to evaluate the usability of these approaches through a qualitative research design involving surveys and focus group discussions with data analysts. The research examines the generation of vast amounts of data on social media platforms like Facebook, Instagram, and Snapchat and explores how data engineering extracts valuable information for organizations to better serve their clients. The report also includes a literature review on social media, data mining, and related concepts. The methodology involves surveying professionals from these fields and assessing their preferences to determine the most suitable strategy for data engineering of social media sites. Ethical considerations and a detailed work plan are also included in this research report.
Document Page
Running head: ENGT 5214
ENGT 5214 STUDY SKILLS AND RESEARCH METHODS
Name of the Student
Name of the University
Author note
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1ENGT 5214
Contents
Introduction......................................................................................................................................2
Research Strategy, aims and objectives...........................................................................................3
Literature Review............................................................................................................................4
Social Media................................................................................................................................4
Data Engineering / Data Mining..................................................................................................5
Specific Data Mining Approaches...............................................................................................6
Research Methodology and Justification.........................................................................................7
Participants..................................................................................................................................7
Materials......................................................................................................................................8
Method.........................................................................................................................................8
Ethical Issues and risk Management................................................................................................8
Work Plan........................................................................................................................................9
Conclusion.....................................................................................................................................10
References......................................................................................................................................11
Document Page
2ENGT 5214
Data engineering of social media sites.
Introduction.
The one thing that is highly common across the vast numbers of social media sites that
are in existence nowadays, is the vast amount of data that is being generated on a daily basis. The
most common and popular social media sites include Facebook, Instagram and Snapchat. These
sites are generating over 500 million units of data every day in the form of images, videos, texts
as well as through searches on their engines (Kaplan & Haenlein 2010). Apart from the visible
data that is usually on the user’s side, there is a large amount of metadata as well being
generated. Furthermore, there are similar data being generated and transmitted over the web
through the use of thousands of applications like online shopping portals, resell and back page
websites and dating applications that inherently use a person’s social media account for
authentication.
Data engineering, or more specifically Social Media Mining, is used to extract those bits
of information in the form of big data (Zafarani, Abbasi & Liu 2014), and analyse them to attain
an overview of how individual people differ in terms of choices and preferences. Based upon
that, the social media sites and all associated sites and applications use the data to generate
customised content for the user base. The current paper is a research methodology paper that is
designed with the objective of identifying, selecting, processing and analysing information about
how data engineering of social media sites work in order to extract valuable information that
helps organisations better serve the clients. The current research puts a more specific focus on
understanding which aspect of data mining is found to be beneficial for both the server side and
the client side. There are several types of data being generated either in the form of texts or
images. The research will look at three specific data mining / engineering approaches, namely
Document Page
3ENGT 5214
Keyword based technologies, Statistical and programming based mining and Computational
Linguistics based mining with NLP.
Research Strategy, aims and objectives.
The current research is designed to understand the viability of either of the three
aforementioned approaches as a suitable strategy for data engineering of social media sites. Each
of the three strategies have dedicated professionals working towards the specific objective of
using these technologies to attain better results in terms of data engineering.
The primary objective of this research is to understand which one is a more suitable
option for professionals to work with among text mining and data mining.
The aims of this research is to evaluate the validity of the following three approaches in
terms of usability in data engineering of social media sites:
Keyword based data engineering.
Statistical approaches and programming based data engineering.
Computational Linguistics and NLP based Data engineering.
The research, following a qualitative approach, will engage professional from all three
domains in two different studies, firstly in a survey on their own fields and secondly, one on
different fields. The results of the research will help determine the most preferred strategy for
Data engineering of social media sites.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4ENGT 5214
Literature Review.
This section is concerned with understanding the key themes of the research proposal
based on prior publications and literatures available in the current field. The key themes of the
proposed research are Social media, Data mining, Keyword based mining and Search Engine
Optimisation, Statistical Programming based mining and Natural Language Processing based
data mining.
Social Media.
Kaplan and Haenlein (2010) describes Social Media as the set of applications and
programmes that are internet based, built upon the foundational philosophy and principles of
WEB 2.0 and allows the user the freedom and flexibility to generate and share created content. In
that aspect there are several subcategories of Social Media. The most common category with the
maximum number of user base is that of Social Networking (e.g. Facebook and LinkedIn) (Van
Dijck 2013), followed, in the current context by image sharing (e.g. Instagram, Snapchat etc.)
(Alhabash & Ma 2017). Video sharing sites like MetaCafe and YouTube have gained immense
popularity in the last decade with the user base crossing billions for the latter (Cool et al. 2017).
Simple messaging applications like WhatsApp, Google Hangout, Telegram etc. have also gained
popularity (Bouhnik, Deshen & Gan 2014; Church & De Oliveira 2013). Finally, one last
category that has gained popularity in the live content sharing and gaming community are
livecasting applications like Twitch or Ustream (Edge 2013). The core idea behind
commercialisation of social media is to help individual people connect with the outside world
while existing in a virtual environment (Kassner et al. 2017). Over the course of time, social
media has made global communication more feasible, however as the user base continues to
increase, there continues to be a significant increase of generated data (Adedoyin-Olowe, Gaber
Document Page
5ENGT 5214
& Stahl 2013). This is where aspects of data mining and engineering become important as well
as relevant.
Data Engineering / Data Mining.
As mentioned above, data engineering, data mining, or in the current context, Social
Media Mining, refers to the process with which big data is obtained from user generated content
on the internet. Data mining helps discover patterns and links between the user and the content,
then through specifically designed algorithms, provide the user with tailored advertisements,
content suggestions or for conduction specific research, particularly useful in forensic analysis
(tang, Chang & Liu 2014). Analogous to the term mining, data mining requires human data
analysis professionals to critically observe and analyse web based data to extract the patterns and
trends with social media use, content sharing, overall online behaviour etc. This information,
commonly termed as metadata, becomes useful for companies, government and non-government
organisations alike for customising and designing new products, processes and strategies. As
Zafarani, Abbasi and Liu (2014) have highlighted, there are a wide range of concepts from fields
of Computer Science, Machine Learning as well as statistics, merged with theories and methods
from Social Network Analysis, network sciences, mathematics, sociology and human cognitive –
behaviour sciences. Social media mining has also seen immense importance in fields of
marketing research where the online shopping behaviour and patterns of individuals are
evaluated on the basis of what their preferences are. Tang et al. (2016) have identified that
personal preferences on the internet can not only drive market strategies, but also become useful
in identifying people with similar choices – something that most dating and matchmaking
applications prefer to use. Furthermore, social media being a massive platform for sharing
opinions (e.g. Twitter), social media mining can also help in identifying the sentiment of a
Document Page
6ENGT 5214
particular target population through analysis of their posts, a process known as sentiment
analysis (Adedoyin-Olowe, Gaber & Stahl 2013), by analysing the specific emotions that the
users display in their posts, using specific words and contextual phrases (Laeeq, Nafis & Beg
2017).
Specific Data Mining Approaches.
Most content that is being generated online are in text format. Keyword based data
mining is one of the simplest methods of data mining and extraction of information (Chen et al.
2013). The analyst needs to input specific keywords in the system through which the content
with a degree of use of that specific word(s) will be segregated from the rest. In contrast,
programming based data engineering approaches use statistical tools and machine learning for
data mining. Most big data professionals and students of Data Science are concerned with
langauges like Python with its flexible library of dependencies for data mining (Dua & Du 2016).
However, this approach has further entailments in the form of statistical analysis and data
visualization. This is where other languages like R, Octave and MatLab etc. become important.
Professionals utilize Machine Learning and Artificial Neural Networks for sorting and extracting
significant data packets from the whole reservoir.
The final approach that has seen development in contemporary periods is the use of
Natural Language Processing for attaining high accuracy and feasibility in terms of data mining
and extraction (Agrawal & Batra 2013). Natural Language Processing (NLP) is a specific
component of text mining that looks at how linguistic analysis techniques in collaboration with
computer mediated methods, can help a machine read a text. It is used in uncovering a variety of
ambiguities and discrepancies in human language and is concerned with aspects like
summarisation, parts of speech tagging, entity and relations extraction etc. (Virmani, Pillai &
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7ENGT 5214
Juneja 2017). By far, the use of NLP has seen significant development in different engineering
fields and in the current context, NLP use with respect to text mining is seeing vital development
in terms of usability, reliance and feasibility.
Research Methodology and Justification.
The current research utilises a qualitative methodology to understand user friendliness of
the selected approaches in data engineering of social media content. The justification behind
using qualitative research is that it is appropriate for small samples, like the one to be used in the
current research. Despite the fact that the results are not usually empirically measureable, a
qualitative research is useful for understanding the participants’ and in turn, a sample’s
perception and opinions regarding a particular topic that is being researched upon. Collis &
Hussey (2003) elaborate that the benefit of using qualitative research is that is allows for a more
thorough understanding and analysis of the research subject without putting any form of
constraint or restriction on the scope of the research or the responses of the participants. Besides,
a qualitative research for a study like this is also appropriate as it highlights how each of the
three elements that has been selected for the study, is being perceived by the participants for their
specific professional purposes as well as if those aspects require any change (Taylor, Bogdan &
DeVault 2015).
Participants.
For this research, 45 professional data analysts will be recruited from professional firms,
based on the criteria that they must be working in either of the following three fields with respect
to data engineering of social media:
Keyword based data engineering.
Document Page
8ENGT 5214
Statistical approaches and programming based data engineering.
Computational Linguistics and NLP based Data engineering.
15 participants from each category will be selected. Age as a factor is not being
considered for this research as it is designed to understand the overall perception of the methods
in profession.
Materials.
Two specific sets of questionnaires will be designed for each of the categories. The first
set will evaluate the generic responses in terms of work feasibility and complications that the
participants face in their own fields. The second set will evaluate similar responses but after the
participants have worked in the other two fields as a hands on experience. The responses will be
recorded on a 5 point Likert scale.
Method.
The participants will be provided with the first set of questionnaires according to their
own field of work. After the first set has been responded to, the participants will be given a week
to experience data mining and engineering of social media sites with the other participants from
the other two fields. After a week of working with the other domains, the second set of
questionnaire will be provided to them to fill.
At the end of the survey, the participants will be engaged in a focus group discussion and
interview process for better and more personal information regarding the research topic.
Document Page
9ENGT 5214
Ethical Issues and risk Management.
Prior to the commencement of the research the participants will be given a consent form
which will elaborate that the participants are taking part in the research out of their own volition
and they will also have the right to withdraw from the research at any point if they feel
uncomfortable. Besides that the anonymity and confidentiality of the participants will also be
ensured appropriately. Further ethical issues that need to be taken care of in this research is
related to ensuring data privacy. Data mining involves data security risks that might make
information of a particular user vulnerable and open to access by unwanted third party sources.
This aspect will also be taken care of efficiently.
In terms of risk management, the current research entails aspects of risk like dirty data.
Thus proper cleaning of data needs to be ensured for all mining activities connected to the
research. Secondly, another important risk associated with data mining of social media is, as
highlighted above, accidental revelation of personal data and unethical hacking made possible by
advanced programming. Any aspect of such behaviour on anybody’s part needs to be properly
curbed at the outset.
Work Plan.
Sl. Stage Description Time
frame
Expected Outcome
1. Pre-
research
In this stage, the participants
will be recruited for the
research through online
posts, flyers and targeted
advertisements.
2 weeks A sample of 45
participants, 15 each from
each category.
2. Primary
Survey
In this stage, the participants
will be given the first
questionnaire set to fill and
submit.
1 day 15 x 3 = 45 different filled
survey questionnaires.
3. Exploration Here, the participants will 1 week Participants have gathered
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
10ENGT 5214
period. indulge in exploring the
other two domains apart
from their own and
understand the working of
those two fields of data
mining of social media sites.
experience regarding fields
that they are not
professionals in.
4. a. Second
survey.
b.
Interview.
The participants will be
given the second
questionnaire to fill.
The participant will take part
in a discussion and interview
session.
1 – 2 days 45 filled questionnaires of
second set.
Responses from each
participant regarding their
experience with the other
fields.
Conclusion.
The current research proposal aims to understand the feasibility of a particular data
mining approach from the following three, namely Keyword based data engineering, Statistical
approaches and programming based data engineering and Computational Linguistics and NLP
based Data engineering. For this it aims to use a qualitative research methodology with a selected
sample of participants and understand how they, as professionals, either view their field of work
to be different from the other two, or how any of the other two domains provide a better
professional support.
Document Page
11ENGT 5214
References.
Adedoyin-Olowe, M., Gaber, M.M. and Stahl, F., 2013. A survey of data mining techniques for
social media analysis. arXiv preprint arXiv:1312.4617.
Agrawal, R. and Batra, M., 2013. A detailed study on text mining techniques. International
Journal of Soft Computing and Engineering, 2(6), pp.118-121.
Alhabash, S. and Ma, M., 2017. A tale of four platforms: Motivations and uses of Facebook,
Twitter, Instagram, and Snapchat among college students?. Social Media+ Society, 3(1),
p.2056305117691544.
Bouhnik, D., Deshen, M. and Gan, R., 2014. WhatsApp goes to school: Mobile instant
messaging between teachers and students. Journal of Information Technology Education:
Research, 13(1), pp.217-231.
Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S. and Zhou, X., 2013. Big data challenge: a data
management perspective. Frontiers of Computer Science, 7(2), pp.157-164.
Church, K. and De Oliveira, R., 2013, August. What's up with whatsapp?: comparing mobile
instant messaging behaviors with traditional SMS. In Proceedings of the 15th international
conference on Human-computer interaction with mobile devices and services (pp. 352-361).
ACM.
Collis, J. and Hussey, R., 2003. Business Research: Palgarve Macmillan.
Cool, K., Seitz, M., Mestrits, J., Bajaria, S. and Yadati, U., 2017. YouTube, google, and the rise
of internet video. Kellogg School of Management Cases, pp.1-25.
chevron_up_icon
1 out of 14
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]