Measuring Formality of Texts Using Various Indices
VerifiedAdded on 2019/09/16
|9
|1868
|379
Report
AI Summary
The provided assignment content discusses various measures of text complexity, including formality and readability. One method to measure formality is by using an equation that adds the frequencies of context-independent words, subtracts the frequencies of context-dependent (deictic) pronouns, and normalizes the sum. Another approach is Flesch reading ease score, which rates text on a 100-point scale based on sentence structure and word complexity. Additionally, other readability formulas such as Fog index and Flesch-Kincaid grade level score are also discussed. These measures can help analyze the complexity of texts from different sources, genres, or styles.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
BCO5010: Business Intelligence Technologies (Assignment 1)
CARDIFF SCHOOL OF MANAGEMENT: ASSIGNMENT FEEDBACK PROFORMA
STUDENT NAME: PROGRAMME: BSc Business Information
Systems/BSc Software Development/ BSc
Computing
STUDENT NUMBER: YEAR: 1 GROUP:
Module Number: BCO5010 Term: 1 Module Title: Business Intelligence
Technologies
Tutor Responsible For Marking This Assignment: Imtiaz Khan
Module Leader: Imtiaz Khan
Assignment Due Date: 18 Dec 2016 (via
Moodle)
Hand In Date:
ASSIGNMENT TITLE: Analysis of locational social media data (Assignment 1) 42.5% of total
marks
SECTION A: SELF ASSESSMENT (TO BE COMPLETED BY THE STUDENT)
In relation to each of the set assessment criteria, please identify the areas in which you feel you
have strengths and those in which you need to improve. Provide evidence to support your self-
assessment with reference to the content of your assignment.
STRENGTHS AREAS FOR IMPROVEMENT
I certify that this assignment is a result of my own work and that all sources have been
acknowledged:
Signed:______________________________________________
Date___________________________
SECTION B: TUTOR FEEDBACK
(based on assignment criteria, key skills and where appropriate, reference to professional
standards)
STRENGTHS AREAS FOR IMPROVEMENT AND TARGETS
FOR FUTURE ASSIGNMENTS
MARK/GRADE AWARDED DATE: SIGNED
CARDIFF SCHOOL OF MANAGEMENT: ASSIGNMENT FEEDBACK PROFORMA
STUDENT NAME: PROGRAMME: BSc Business Information
Systems/BSc Software Development/ BSc
Computing
STUDENT NUMBER: YEAR: 1 GROUP:
Module Number: BCO5010 Term: 1 Module Title: Business Intelligence
Technologies
Tutor Responsible For Marking This Assignment: Imtiaz Khan
Module Leader: Imtiaz Khan
Assignment Due Date: 18 Dec 2016 (via
Moodle)
Hand In Date:
ASSIGNMENT TITLE: Analysis of locational social media data (Assignment 1) 42.5% of total
marks
SECTION A: SELF ASSESSMENT (TO BE COMPLETED BY THE STUDENT)
In relation to each of the set assessment criteria, please identify the areas in which you feel you
have strengths and those in which you need to improve. Provide evidence to support your self-
assessment with reference to the content of your assignment.
STRENGTHS AREAS FOR IMPROVEMENT
I certify that this assignment is a result of my own work and that all sources have been
acknowledged:
Signed:______________________________________________
Date___________________________
SECTION B: TUTOR FEEDBACK
(based on assignment criteria, key skills and where appropriate, reference to professional
standards)
STRENGTHS AREAS FOR IMPROVEMENT AND TARGETS
FOR FUTURE ASSIGNMENTS
MARK/GRADE AWARDED DATE: SIGNED
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
BCO5010: Business Intelligence Technologies (Assignment 1)
ASSIGNMENT MODERATED BY: DATE
MODERATOR’S COMMENTS:
1. Learning outcomes
By completing this assignment the student will learn about
Social media data analysis with Excel and creating dashboard for reporting.
Apply OLAP approach in conjunction with database for reporting.
Design, program and link social media data with geospatial data.
Critical analysis and evaluation of results.
2. Assignment outline and guidance notes
Analysis of locational social media data
Case study: Monmouthshire County Council
Outline:
Monmouthshire County Council (MCC) would like to increase their organisational understanding and
potential impact of existing and potential online networks. They are pioneering a more relaxed and
informal approach to talking with residents and developing relationships with other stakeholders. As
part of its programme of culture change (outlined here: http://www.yc-yw.co.uk/) the council is
empowering staff and residents to innovate and find new ways of co-creating a better place to live.
MCC have now conducted a pilot study to find out the opinions of council residents and have
gathered a large dataset of social media data. They asked residents to describe how they thought
MCC could best improve their services. MCC were not only interested in the topics broached, but
also the style of the respondents’ narrative. They developed some calculated fields, and now they
have asked you to finish off the analysis of their data and report on your analysis.
A detailed summary of the data can be found in Appendix A, however in summary it contains the
following:
ASSIGNMENT MODERATED BY: DATE
MODERATOR’S COMMENTS:
1. Learning outcomes
By completing this assignment the student will learn about
Social media data analysis with Excel and creating dashboard for reporting.
Apply OLAP approach in conjunction with database for reporting.
Design, program and link social media data with geospatial data.
Critical analysis and evaluation of results.
2. Assignment outline and guidance notes
Analysis of locational social media data
Case study: Monmouthshire County Council
Outline:
Monmouthshire County Council (MCC) would like to increase their organisational understanding and
potential impact of existing and potential online networks. They are pioneering a more relaxed and
informal approach to talking with residents and developing relationships with other stakeholders. As
part of its programme of culture change (outlined here: http://www.yc-yw.co.uk/) the council is
empowering staff and residents to innovate and find new ways of co-creating a better place to live.
MCC have now conducted a pilot study to find out the opinions of council residents and have
gathered a large dataset of social media data. They asked residents to describe how they thought
MCC could best improve their services. MCC were not only interested in the topics broached, but
also the style of the respondents’ narrative. They developed some calculated fields, and now they
have asked you to finish off the analysis of their data and report on your analysis.
A detailed summary of the data can be found in Appendix A, however in summary it contains the
following:
BCO5010: Business Intelligence Technologies (Assignment 1)
Fields Explanation
id Unique ID for resident
formality, flesch, fog, kincaid,
percentComplexWords,
syllablesPerWords, wordsPerSentence,
wordcount
A series of fields all related to the structure and style of the
residents response
sex Sex of the resident
extraversion, emotional stability,
agreeableness, conscientiousness,
openness to experience
A series of fields corresponding to the “Big 5” personality
traits
longitude/latitude Geographical location of resident
education, jobs and employment,
recycling and waste, buses and public
transport, planning and housing, care and
support, activities and leisure
The topics the resident mentioned.
You have been provided an Excel spreadsheet with several thousand rows of data (each row is a
unique resident), in the above format. Additionally you have access to KML files for Gwent boundary
and neighbourhoods.
2.1 Assignment tasks:
Statistical analysis [20%]
Using the Excel spreadsheet:
Create a dashboard to view a selection of the data in a more synthetic and operationally
useful way.
A table showing the mean and standard deviation for each feature
Any appropriate functions in Excel (e.g. frequency distributions and polygons, overlaying
‘curves of best fit’ etc) to explore:
o M1 versus literacy fields
o M1 versus M2 and/or M3
o Any significant relationships between M4, M5, M6 and M7
OLAP [20%]
Import the data from the spreadsheet into an Access database.
Create two appropriate queries and supporting reports.
Fields Explanation
id Unique ID for resident
formality, flesch, fog, kincaid,
percentComplexWords,
syllablesPerWords, wordsPerSentence,
wordcount
A series of fields all related to the structure and style of the
residents response
sex Sex of the resident
extraversion, emotional stability,
agreeableness, conscientiousness,
openness to experience
A series of fields corresponding to the “Big 5” personality
traits
longitude/latitude Geographical location of resident
education, jobs and employment,
recycling and waste, buses and public
transport, planning and housing, care and
support, activities and leisure
The topics the resident mentioned.
You have been provided an Excel spreadsheet with several thousand rows of data (each row is a
unique resident), in the above format. Additionally you have access to KML files for Gwent boundary
and neighbourhoods.
2.1 Assignment tasks:
Statistical analysis [20%]
Using the Excel spreadsheet:
Create a dashboard to view a selection of the data in a more synthetic and operationally
useful way.
A table showing the mean and standard deviation for each feature
Any appropriate functions in Excel (e.g. frequency distributions and polygons, overlaying
‘curves of best fit’ etc) to explore:
o M1 versus literacy fields
o M1 versus M2 and/or M3
o Any significant relationships between M4, M5, M6 and M7
OLAP [20%]
Import the data from the spreadsheet into an Access database.
Create two appropriate queries and supporting reports.
BCO5010: Business Intelligence Technologies (Assignment 1)
Using PowerPivot1 (or any other tool suitable for the version of Microsoft Office) create any
appropriate OLAP cubes2 in Excel using the Access database as the data source
Geographical analysis [40%]
Google Earth
Using the provided MapExcelData.zip, convert your resident Longitude/Latitudes
into a Google Earth KML file. Download also the Gwent boundary and neighborhood
KML files.
Using the CrimeStat Spatial Statistics Program3
Calculate the following centrographic statistics for both male and female
respondents:
Mean center
Median center
Center of minimum distance
Standard deviation of X and Y coordinates
Standard distance deviation
Standard deviational ellipse
Convex hull
Choose one method of ‘hot spot’ analysis
Export all your results in shapefile format
Using the shp2kml4 tool convert all of your shapefiles to KML format
Load all of your KML files into Google Earth
Google Fusion Tables
Use Google Fusion Tables5 to combine and explore your Excel and KML data in
Google Fusion Table maps. Either use your KML files or your shapefile (e.g. convert
using Shape Escape6)
Critical evaluation [20%]
Embed all figures, screen shots of map on word document.
Evaluate each results.
Justify any patterns discovered within the results.
Suggest way to improve reporting the anal
3. Submission
The due date for Assignment 1 is Week 12.
1 http://www.powerpivotpro.com/
2 http://www.powerpivotpro.com/2010/06/using-excel-cube-functions-with-powerpivot/
3 http://www.icpsr.umich.edu/CrimeStat/
4 http://www.zonums.com/shp2kml.html shp2kml 2.0: Shape-file to Google Earth
5 http://www.google.com/drive/apps.html#fusiontables
6 http://www.shpescape.com/
Using PowerPivot1 (or any other tool suitable for the version of Microsoft Office) create any
appropriate OLAP cubes2 in Excel using the Access database as the data source
Geographical analysis [40%]
Google Earth
Using the provided MapExcelData.zip, convert your resident Longitude/Latitudes
into a Google Earth KML file. Download also the Gwent boundary and neighborhood
KML files.
Using the CrimeStat Spatial Statistics Program3
Calculate the following centrographic statistics for both male and female
respondents:
Mean center
Median center
Center of minimum distance
Standard deviation of X and Y coordinates
Standard distance deviation
Standard deviational ellipse
Convex hull
Choose one method of ‘hot spot’ analysis
Export all your results in shapefile format
Using the shp2kml4 tool convert all of your shapefiles to KML format
Load all of your KML files into Google Earth
Google Fusion Tables
Use Google Fusion Tables5 to combine and explore your Excel and KML data in
Google Fusion Table maps. Either use your KML files or your shapefile (e.g. convert
using Shape Escape6)
Critical evaluation [20%]
Embed all figures, screen shots of map on word document.
Evaluate each results.
Justify any patterns discovered within the results.
Suggest way to improve reporting the anal
3. Submission
The due date for Assignment 1 is Week 12.
1 http://www.powerpivotpro.com/
2 http://www.powerpivotpro.com/2010/06/using-excel-cube-functions-with-powerpivot/
3 http://www.icpsr.umich.edu/CrimeStat/
4 http://www.zonums.com/shp2kml.html shp2kml 2.0: Shape-file to Google Earth
5 http://www.google.com/drive/apps.html#fusiontables
6 http://www.shpescape.com/
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
BCO5010: Business Intelligence Technologies (Assignment 1)
You must submit via Moodle the following files:
• Word document (max 2000 words) containing the critical evaluation.
• All codes with documentation in a zip file.
4. Assessment criteria
Marks Functionality Reporting Evaluation Understanding
> 70% All codes fully
functional with
proper
documentation
Elegantly
presented
figures, maps
with proper
annotations.
Excellent
evaluation of
the results.
Very good
presentation.
Critical
understanding
of the key issues
of BI.
Intuitive
discovery with
proper
justifications.
60 – 69% Most of the
codes work.
Average
documentation
Well presented
figures, maps
with average
annotations.
Very good
evaluation of
the results.
Average
presentation.
Very good
understanding
of the key issues
of BI.
Discovery with
proper
justifications.
50 – 59% Part of the codes
work. Basic
documentation
Average
presentation of
figures, maps
with basic
annotations.
Average
evaluation of
the results.
Average
presentation
Basic
understanding
of the key issues
of BI.
No discovery.
40 – 49% Minimal amount
of the codes
work. No
documentation
Basic
presentation of
figures, maps
with no
annotations.
Basic
evaluation of
the results.
Poor
presentation
Very little
understanding
of the key issues
of BI. No
discovery.
< 40% None of the code
work. No
documentation.
Poor
presentation of
figures, maps
with no
annotations.
Little evidence
of evaluation of
the results.
Poor
presentation
No evidence of
understanding
of the key issues
of BI. No
discovery.
You must submit via Moodle the following files:
• Word document (max 2000 words) containing the critical evaluation.
• All codes with documentation in a zip file.
4. Assessment criteria
Marks Functionality Reporting Evaluation Understanding
> 70% All codes fully
functional with
proper
documentation
Elegantly
presented
figures, maps
with proper
annotations.
Excellent
evaluation of
the results.
Very good
presentation.
Critical
understanding
of the key issues
of BI.
Intuitive
discovery with
proper
justifications.
60 – 69% Most of the
codes work.
Average
documentation
Well presented
figures, maps
with average
annotations.
Very good
evaluation of
the results.
Average
presentation.
Very good
understanding
of the key issues
of BI.
Discovery with
proper
justifications.
50 – 59% Part of the codes
work. Basic
documentation
Average
presentation of
figures, maps
with basic
annotations.
Average
evaluation of
the results.
Average
presentation
Basic
understanding
of the key issues
of BI.
No discovery.
40 – 49% Minimal amount
of the codes
work. No
documentation
Basic
presentation of
figures, maps
with no
annotations.
Basic
evaluation of
the results.
Poor
presentation
Very little
understanding
of the key issues
of BI. No
discovery.
< 40% None of the code
work. No
documentation.
Poor
presentation of
figures, maps
with no
annotations.
Little evidence
of evaluation of
the results.
Poor
presentation
No evidence of
understanding
of the key issues
of BI. No
discovery.
BCO5010: Business Intelligence Technologies (Assignment 1)
BCO5010: Business Intelligence Technologies (Assignment 1)
Appendix A: Data fields explanation
1. id
2. formality
3. flesch
4. fog
5. kincaid
6. percent complex words
7. syllables per words
8. words per sentence
9. word count
10. sex
11. “Big 5” personality traits
12. longitude / latitude
13. mentioned
1. id
ID linking to all datasets.
2. formality
The degree of formality of a text can be measured by adding the frequencies of context-independent
words, subtracting the frequencies of context-dependent (deictic) pronouns) and normalizing the
sum
Grouping words in the traditional grammatical categories (nouns, verbs, prepositions, etc.), this
produces the following equation for formality (F):
F = (noun frequency + adjective freq. + preposition freq. + article freq. - pronoun freq. - verb freq. -
adverb freq. - interjection freq. + 100)/2
Such a formula provides an easily applicable measure for ordering language from different sources,
genres or styles according to their formality. The calculated formality corresponds generally quite
well with intuitive expectations, e.g. official documents or scientific texts are more formal than
personal letters, speeches are more formal than conversations, etc. For example, data for Dutch
reveal the following ordering:
Appendix A: Data fields explanation
1. id
2. formality
3. flesch
4. fog
5. kincaid
6. percent complex words
7. syllables per words
8. words per sentence
9. word count
10. sex
11. “Big 5” personality traits
12. longitude / latitude
13. mentioned
1. id
ID linking to all datasets.
2. formality
The degree of formality of a text can be measured by adding the frequencies of context-independent
words, subtracting the frequencies of context-dependent (deictic) pronouns) and normalizing the
sum
Grouping words in the traditional grammatical categories (nouns, verbs, prepositions, etc.), this
produces the following equation for formality (F):
F = (noun frequency + adjective freq. + preposition freq. + article freq. - pronoun freq. - verb freq. -
adverb freq. - interjection freq. + 100)/2
Such a formula provides an easily applicable measure for ordering language from different sources,
genres or styles according to their formality. The calculated formality corresponds generally quite
well with intuitive expectations, e.g. official documents or scientific texts are more formal than
personal letters, speeches are more formal than conversations, etc. For example, data for Dutch
reveal the following ordering:
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
BCO5010: Business Intelligence Technologies (Assignment 1)
context- independent categories deictic categories
Nouns Articles Prep. Adject. Pron. Verbs Adv. Conj. Form.
Oral Female 10.40 6.89 5.86 8.09 16.95 19.35 17.45 7.47 38.7
Oral N.Acad. 12.75 8.50 6.34 6.71 16.01 18.80 19.31 6.34 40.1
Oral Male 11.48 8.16 6.69 7.63 15.84 18.45 16.53 7.05 41.6
Oral Acad. 13.16 9.58 7.91 7.13 13.96 17.75 17.88 7.13 44.1
Novels 18.52 10.48 10.26 10.00 13.25 20.62 10.47 6.06 52.5
Fam. Magaz. 21.78 9.77 12.21 11.14 10.09 18.71 9.74 6.39 58.2
Magazines 24.20 11.61 13.90 10.93 8.55 17.68 8.73 4.34 62.8
Scientific 23.10 15.00 13.75 10.75 6.71 16.58 7.98 5.98 65.7
Newspapers 25.97 14.68 14.54 10.57 5.62 16.69 7.21 4.70 68.1
3. flesch
Flesch reading ease score.
Equation:
206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word)
This score rates text on a 100 point scale. The higher the score, the easier it is to understand the
text. A score of 60 to 70 is considered to be optimal.
4. fog
Fog index.
Equation:
( words_per_sentence + percent_complex_words ) * 0.4
The Fog index, developed by Robert Gunning, is a well known and simple formula for measuring
readability. The index indicates the number of years of formal education a reader of average
intelligence would need to understand the text on the first reading.
18 unreadable
14 difficult
context- independent categories deictic categories
Nouns Articles Prep. Adject. Pron. Verbs Adv. Conj. Form.
Oral Female 10.40 6.89 5.86 8.09 16.95 19.35 17.45 7.47 38.7
Oral N.Acad. 12.75 8.50 6.34 6.71 16.01 18.80 19.31 6.34 40.1
Oral Male 11.48 8.16 6.69 7.63 15.84 18.45 16.53 7.05 41.6
Oral Acad. 13.16 9.58 7.91 7.13 13.96 17.75 17.88 7.13 44.1
Novels 18.52 10.48 10.26 10.00 13.25 20.62 10.47 6.06 52.5
Fam. Magaz. 21.78 9.77 12.21 11.14 10.09 18.71 9.74 6.39 58.2
Magazines 24.20 11.61 13.90 10.93 8.55 17.68 8.73 4.34 62.8
Scientific 23.10 15.00 13.75 10.75 6.71 16.58 7.98 5.98 65.7
Newspapers 25.97 14.68 14.54 10.57 5.62 16.69 7.21 4.70 68.1
3. flesch
Flesch reading ease score.
Equation:
206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word)
This score rates text on a 100 point scale. The higher the score, the easier it is to understand the
text. A score of 60 to 70 is considered to be optimal.
4. fog
Fog index.
Equation:
( words_per_sentence + percent_complex_words ) * 0.4
The Fog index, developed by Robert Gunning, is a well known and simple formula for measuring
readability. The index indicates the number of years of formal education a reader of average
intelligence would need to understand the text on the first reading.
18 unreadable
14 difficult
BCO5010: Business Intelligence Technologies (Assignment 1)
12 ideal
10 acceptable
8 childish
5. kincaid
Flesch-Kincaid grade level score
Equation:
(11.8 * syllables_per_word) + (0.39 * words_per_sentence) - 15.59
This score rates text on U.S. grade school level. So a score of 8.0 means that the document can be
understood by an eighth grader. A score of 7.0 to 8.0 is considered to be optimal.
6. percent complex words
What percentage of text has complex words with several syllables.
7. syllables per words
Average number of syllables per word.
8. words per sentence
Average number of words per sentence.
9. word count
Count of words in all free text fields.
10. sex
Male = 1, Female =2
12 ideal
10 acceptable
8 childish
5. kincaid
Flesch-Kincaid grade level score
Equation:
(11.8 * syllables_per_word) + (0.39 * words_per_sentence) - 15.59
This score rates text on U.S. grade school level. So a score of 8.0 means that the document can be
understood by an eighth grader. A score of 7.0 to 8.0 is considered to be optimal.
6. percent complex words
What percentage of text has complex words with several syllables.
7. syllables per words
Average number of syllables per word.
8. words per sentence
Average number of words per sentence.
9. word count
Count of words in all free text fields.
10. sex
Male = 1, Female =2
1 out of 9
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.