Tests and Measurements: Scale Construction, Types, and Item Analysis

Verified

Added on 2023/06/13

AI Summary

This report provides a detailed overview of tests and measurements, focusing on scale types and item construction. It begins by emphasizing the importance of reliable and valid personality assessments, highlighting key considerations for item syntax, grammar, wording, and answering scales. The report discusses guidelines for item construction, including construct clarity, concise statements, and cautious use of reverse-scored items. It also covers content adequacy assessment, factor analysis, internal consistency assessment using Cronbach's Alpha, and construct validation. Furthermore, the report delves into different scale types, such as Likert scales, Bogardus social distance scales, semantic differential scales, and Thurstone scales, explaining their construction and appropriate usage. It also analyzes the NEO-PI-R extraversion items, examining reference points, general item approaches, construct indicators, and conditionality to evaluate item format. The study concludes by offering insights into the analysis of NEO-PI-R extraversion items, demonstrating the application of item format analysis across various dimensions.

Running head: TESTS AND MEASUREMENTS
1
Tests and measurements
Name:
Institution:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

TESTS AND MEASUREMENTS 2
Introduction
Reliably and validly assessing people's personality is essential to a mental evaluation.
Therefore, considerable energy has been put into building self-report action. Researcher typically
seeks to capitalize on the content rationality of the tests and each item’s content is thoroughly
selected to apprehend the construct. But, an influence of item syntax, grammar, and wording are
underestimated or neglected (Allemand et al., 2008, pp.758). Not only the content but also the
answering scales (formats) are crucial and affect psychometric aspects of a scale such as a
criterion and construct intensity.
Some basic guideline exists that ought to be tracked to safeguard that items are correctly
built. It is crucial to keep all the item constructs in details, and being confident not to combine
things that evaluate constructs with that assessing outcomes or reactions (Ziegler & Hagemann,
2015, pp. 43). Another essential aspect is that statement should be modest and as short as
possible, and the linguistic used should be familiar to the target respondents (Kline, 2005, pp.
30). Reverse-scored or negatively-phrased should be used with care as a few of these items
scattered can have a detrimental result on the psychometric features. For the meaningful
responses to be obtained, questions should be comprehended by the respondent as intended by
the researchers (Kline, 2005, pp. 49). Finally, content redundancies are necessary when building
numerous items because they are the basis of inner consistency.
Content adequacy assessment is also critical to the item construction. In many instances,
a researcher invests a lot of resource and time only to discover that they have flawed necessary
measure. Ensuring the content are sufficient before final questionnaire development offers aid
for construct validity and permits the removal of items that may be theoretically unreliable

TESTS AND MEASUREMENTS 3
(Ziegler & Hagemann, 2015, pp. 45). Similarly, researcher should take care of the factor analysis
as it assists in determining how many aspects exist for the set of items. Internal consistency
assessment is also a significant element in item constructs. But, the most recommended for
measuring the scales internal consistency is Cronbach Alpha which illustrates how the item
gauges the same construct. Construct validation and replication is also significant concerns in
the constructing the question (Bayoglu et al., 2013, pp. 331). For researchers to circumvent the
typical source concerns, it is commended that facts from sources other than the respondents, such
as regular appraisals, be composed where possible. The duplication includes factors analysis,
evaluation of internal reliability and constructs authentication. Therefore, the above reviews
should offer the scholar with the sureness that the completed measures hold the validity and
reliability and would be appropriate for use in an upcoming study.
Type of scales
A scale is a form of a merged gauge that is comprised of numerous objects that have
empirical or logical assembly (Ashton & Lee, 2007, pp.107). Therefore, the scales make use of
variances among the pointers of the variables. For instance: when a query has the reaction
selection such as at all times, occasionally, hardly and never. It signifies a scale because the
response selections have differences in intensity and are rank-ordered. Numerous types of range
exist, but in this paper, it will analyze four regularly used scales in social science study and how
they are created (Fishman & Galguera, 2003, pp. 61).
Likert scale
The scale is regularly used in social science survey. The level is termed after the
psychologist Rensis Likert (Dittrich et al., 2007, pp. 4). Being one of the commonly used scales,

TESTS AND MEASUREMENTS 4
it illustrates how a candidate provides their view on something by affirming the level to which
they approve or differ. "The scale looks similar such as, intensely agree, and agree, neither
agree nor disagree, disagree, sturdily disagree” (Dittrich et al., 2007, pp. 5).
For researchers to build the size, each answer selection is allocated a range starting from
0 to 4 (Dittrich et al., 2007, pp. 6). Likert items can be supplemented collectively for each to get
a complete score. For instance: measurement of prejudice against women (Kline, 2005, pp. 64).
The score of every statement would be summed for each candidate to build a complete count of
bias. If we had five comments and candidates hugely disagreed to each item, her or his compete
prejudice score would be 0, denoting a slight degree of bias against women (Dittrich et al., 2007,
pp. 9).
Bogardus social distance scale
The Bogardus scale is a method for gauging the readiness of individuals to take part in
social links with others. Entirely, the range requests individuals to confirm the extent to which
they are tolerating another group. “Each item on the level is counted to reveal the social space,
from 1.00 as a measure of number social distance to 5.00 measuring the maximum social
distance”(Bayoglu et al., 2013, pp. 337). When the answer of each reaction is averaged, a higher
magnitude denotes a lower level of approval.
A semantic differential scale
The differential gauge is a collaboration of more than one range. The maximum disparity
scale is used in trade-off analyses (Revelle et al., 2011, pp. 3). Max diff study is utilized in new
products aspect or even in market division study to get proper arrangements of the most vital
prediction feature. The scale asks the candidates to answer the questionnaires and selects

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

TESTS AND MEASUREMENTS 5
between two different standings (Revelle et al., 2011, pp. 7). For example, assume one wanted
to have respondent’s opinion about new video games shows. One would first choose what
scopes to gauge and discover two different items that signify the aspects. For instance: not
relatable and relatable, not funny and funny, enjoyable and un-enjoyable. Therefore, one build
the rating sheet for the candidates to indicates how they feel about the video games in each
length (Revelle et al., 2011, pp. 19).
Thurstone scale
The scale was generated by Thurstone Louis, planned to advance design for creating
groups of pointers of a variable that has an innovative construction. For instance, the
discrimination study, one would build a list of items (like ten) and ask a candidate to allocate a
score of 1 to 10 for each object (Ashton & Lee, 2007, pp.110). In essence, a candidate has ranked
the elements according to the feeblest indications of discernment all the way to the most robust
gauge. Once the candidates have recorded the details, the study investigates the scores allocated
to each element by determining which items the applicants settled upon most. If the scale items
were sufficiently established and rated, the effectiveness and economy of facts reductions present
in the Bogardus social distance scale would appear.
Appropriate scale
Likert-type scales: range is essential when measuring latent construct; characteristics of
individuals such has opinion, attitudes or feelings (Dittrich et al., 2007, pp. 10). Latent is usually
constructs that are thought of discreet people physiognomies. Typically, the scale use statement
and response from 3-7 point response scales (Dittrich et al., 2007, pp. 13). The items should be
expressed in a manner that only possesses one distinct per question so that it is precisely what the

TESTS AND MEASUREMENTS 6
individual is replying. Also one should avoid using a word such as not or other unfavorable
terms directly as it become perplexing about what it means to disagree with a negative. Likert
scale is also significant when one wants to measure the intensity of opinion. Likert scales can be
counted in a range of techniques. One would rely upon each item so that higher scores specify
more of some of the characters and then take the mean of all elements (Dittrich et al., 2007, pp.
15). However, number will not have any intrinsic sense. For instance, the measurement of the
politician attitudes, scoring a 4.4 does not mean excluding that on average (Ax & Fagan, 2007,
pp.69). Finally, one repeatedly would wish to check the consistency of the Likert style scale
using internal reliability (Dittrich et al., 2007, pp. 14). Arithmetically, internal consistency is the
mean of all potential split-half correlation.
Three item construct
Warmth Assertiveness Positive outcomes
NE0-PI-R
extraversion
A taxonomy for item formats
According to Rauthmann (2011), items means capturing traits content that are numerous
in their arrangements and structures, yet they can be planned according to specific scopes: point
of references, construct indicators, item format, and contextuality/ conditionality (119). Attribute
items use staticity statement or explanations such as ‘I go out and talk to individuals.' Secondly,
it uses frequency description of mental process and behaviors. For instance, ‘I regularly go out
and communicate with individuals’'. Additionally, it uses explanations regarding the valency of

TESTS AND MEASUREMENTS 7
one’s emotional state towards something. For instance, ‘I like going out and speaking with
individuals’ (Norris & Lecavalier, 2010, pp. 10). The three tactics can deemed as overall item
layouts into which most items apt. Staticity explanations use an occurrence and valency
description tactic. It is worth noting that they require not to be specified within the item but can
be established in the answer scales. Nevertheless, using a universal method to item format, it also
comprises various construct-applicable indicators which mostly attributes context, behaviors,
mental process and situations (Rauthmann, 2011, pp. 120). Moreover, the items can be
unconditional or conditional. Limited items frequently use ‘if or when’ phrases to provide a
background specification under which specific psychological actions and conducts happen.
Similarly, the items perspective can be differentiated: there can be a first person mentioning to
her or his qualities, practices and mental courses such as I am/ sense/ act/ think/ do (Ax & Fagan,
2007, pp.71). However, one can possessively apply to her or his psychological process,
behaviors or attributes like my behavior, feelings or thoughts.
Therefore, most items mean catching trait-construct that can be explained in four-way
collaboration of the following scopes. Reference point includes; indicators, possessive, first
person) (Cheung et al., 2011, pp.593). General item plan comprises valency, staticity, incidence,
and valency+ frequency. Construct sign: mental, behavior, attribute and contextual.
Conditionality includes conditional and unconditional (Rauthmann, 2011, pp. 122). But, not all
interfaces are conceivable, and though most items have stood on all four dimensions, various
items may have two standing within a proportion.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

TESTS AND MEASUREMENTS 8
Analysis of NEO-PI-R extraversion items
All studies were executed on NEO-PI-R. The style was selected due to broad
applicability and extensiveness. The feature of extraversion is an essential aspect of human
nature (Cheung et al., 2011, pp.593). The item format analyses were done according to four
elements of items; reference point, general item approach, constructs indicators and
conditionality. Numerous procedures for each length and its subdivision were used to assess each
item create on the sub-dimension. Outcomes of each piece offered a tally for all four scores. For
instance, an object suggesting ‘I am assertive and dominant’ would be categorized as an item in
staticity method in an unconditional approach with attribute indicators and first-person situation
(Ax & Fagan, 2007, pp. 80). The table below shows results of formal item analysis for NEO-PI-
R aspects of extraversion (assertiveness, positive emotions, and warmth)
% Point reference % general item
approach
% Construct indicator %Conditiona
lity
1 2 3 4 S F V F+
V
A B M C NO YES
Warmth 87.5 0 12.
5
0 7
5
0 25 0 25 37.5 37.5 0 100 0
Assertiveness 87.5 0 12.
5
0 2
5
5
0
25 0 12.
5
87.5 0 0 100 0
Positive
Outcomes
100 0 0 0 5
0
5
0
0 0 37.
5
37.5 25 0 100 0

TESTS AND MEASUREMENTS 9
S-staticity tactic, f -frequency, V-valency method, A – attributes, B- behavioral, C- contextual,
M-mental approach
1-oneself first person
2- About own characteristic
3- Oneself and one's trait, behavioral and psychological procedures
4-Construct-relevant indicator
Point of references; the first person standpoint was established in greatest items. Only
two items had other individuals as the reference point and also just two had construct indicators
as the references point. Item method used NEO-PI-R extraversion facet was the staticity method,
followed by the frequency approach and finally the valency approach. It is evident most of the
items of the extraversion was in a staticity setup which creates no references to occurrences or
valences. Construct indicators: most gauges are behavior in the extraversion features and also
showed attribute and mental signs. Finally, conditional items were scarce in the NEO-PI-R
version.

TESTS AND MEASUREMENTS 10
References
Allemand, M., Zimprich, D., & Hendriks, A. A. (2008). Age differences in five personality
domains across the life span. Developmental Psychology, 44(3), 758.
Ashton, M.C., & Lee, K. (2007). Empirical, theoretical, and practical advantages of the
HEXACO model of personality structure. Personality and Social Psychology Review,
11, 150–166.
Ax, R. K., & Fagan, T. J. (2007). Corrections, mental health, and social policy: International
perspectives. Charles C Thomas Publisher, pp.61-80.
Bayoglu, B., Unal, O., Elibol, F., Karabulut, E., & Innocenti, M.S. (2013). Turkish Validation of
the PICCOLO (Parenting Interactions with Children: Checklist of Observations Linked to
Outcomes). Infant Mental Health Journal, 34(4), 330–338.
Cheung, F. M., van de Vijver, F. J., & Leong, F. T. (2011). Toward a new approach to the study
of personality in culture. American Psychologist, 66(7), 593.
Dittrich, R., Francis, B., Hatzinger, R., & Katzenbeisser, W. (2007). A paired comparison
approach for the analysis of sets of Likert-scale responses. Statistical Modelling, 7(1), 3-
28.
Fishman, J. A., & Galguera, T. (2003). Introduction to test construction in the social and
behavioral sciences: A practical guide. Rowman & Littlefield Publishers, pp.53-169
Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Sage,
pp. 29-75.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

TESTS AND MEASUREMENTS 11
Norris, M., & Lecavalier, L. (2010). Evaluating the use of exploratory factor analysis in
developmental disability psychological research. Journal of autism and developmental
disorders, 40(1), 8-20.
Rauthmann, J. F. (2011). Not only item content but also item format is important: Taxonomizing
item format approaches. Social Behavior and Personality: an international
journal, 39(1), 119-128.
Revelle, W., Wilt, J., & Condon, D. M. (2011). Individual differences and differential
psychology. The Wiley-Blackwell handbook of individual differences, 1-38.
Ziegler, M., & Hagemann, D. (2015). Testing the unidimensionality of items, pp. 42-49.

1 out of 11

Tests and Measurements: Scale Construction, Types, and Item Analysis

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Related Documents

A Comprehensive Report on the Validity and Reliability of the SCS

University of Windsor: Schutte Emotional Intelligence Scale Evaluation

SPSS Analysis Report: Website Usage Survey among University Students

Critical Analysis: Wagner's Critical Thinking Test Manual Evaluation

Introduction to Mental Health: Evaluation Techniques and Research

+13062052269

info@desklib.com

Tests and Measurements: Scale Construction, Types, and Item Analysis

Paraphrase This Document

You're viewing a preview

Paraphrase This Document

You're viewing a preview

Paraphrase This Document

You're viewing a preview

Paraphrase This Document

Related Documents

A Comprehensive Report on the Validity and Reliability of the SCS

University of Windsor: Schutte Emotional Intelligence Scale Evaluation

SPSS Analysis Report: Website Usage Survey among University Students

Critical Analysis: Wagner's Critical Thinking Test Manual Evaluation

Introduction to Mental Health: Evaluation Techniques and Research

+13062052269

info@desklib.com