Face-to-Face versus Online Course Evaluations: A 'Consumer's Guide' to Seven Strategies

Added on 2023-04-06

9 Pages4808 Words370 Views

MERLOT Journal of Online Learning and Teaching Vol. 9, No. 1, March 2013
140
Face-to-Face versus Online Course Evaluations:
A "Consumer's Guide" to Seven Strategies
Ronald A. Berk
Professor Emeritus of Biostatistics and Measurement
Schools of Education and Nursing
The Johns Hopkins University
Baltimore, MD 21218 USA
rberk1@jhu.edu
Abstract
The research on student rating scales and other measures of teaching effectiveness in
face-to-face (F2F) courses has been accumulating for 90 years. With the burgeoning
international development of online and blended/hybrid courses over the past decade,
the question of what measures to use has challenged directors of distance education
programs. Can the traditional F2F scales already in operation be applied to online
courses or do all new scales have to be designed? Despite the increasing number of
online courses, attention to their evaluation lags far behind that of F2F courses in terms
of available measures, quality of measures, and delivery systems. The salient
characteristics of F2F and online courses are compared to determine whether they are
really different enough to justify separate scales and evaluation systems. Based on a
review of the research and current practices, seven concrete measurement options
were generated. They are proffered and critiqued as a state-of-the-art "consumer's
guide" to the evaluation of online and blended courses and the faculty who teach them.
Keywords: student rating scales, student evaluation of teaching (SET), teaching
effectiveness, blended courses, hybrid courses, web-based courses, distance learning,
student–instructor interaction, content delivery, evaluation rubrics, technology tools
Introduction
There are nearly 2,000 references on student rating scales used in face-to-face (F2F) courses (Benton &
Cashin, 2012), with the first journal article published 90 years ago (Freyd, 1923). In higher education
there is more research on and experience with student ratings than with all of the other 14 measures of
teaching effectiveness combined, including peer, self, administrator, learning outcomes, and teaching
portfolio (Berk, 2006, 2013). With all that has been written about student ratings (Arreola, 2007; Berk,
2006; Seldin, 2006), there are three up-to-date reviews (Benton & Cashin, 2012; Gravestock & Gregor-
Greenleaf, 2008; Kite, 2012) that furnish a research perspective from the world of F2F faculty evaluation.
Unfortunately, there has not been nearly the same level of attention given to the rating scales and other
measures used for summative decisions about faculty who teach blended/hybrid and online courses and
the evaluation of those courses. Given the sizable commitment by colleges and universities to the F2F
scales already being used, can they be applied to online courses? Are online courses structured and
delivered that differently from F2F courses? Is the use of technology a big factor that should be
measured? Do faculty and administrators now need to develop all new measures for the online courses?
What are directors of distance education supposed to use?
The purpose of this paper is to clarify the measurement options available to evaluate teaching
effectiveness in online courses primarily for faculty employment decisions of contract renewal, merit pay,
teaching awards, promotion, and tenure. That information can also be used for course and program
evaluation. The first two sections briefly review the status of online courses and the major characteristics
of F2F and online courses to determine whether they are really different enough to justify separate
measures and evaluation systems. Finally, based on a review of the research and current practices,
seven concrete measurement options are described. They are proffered and critiqued as a state-of-the-

Face-to-Face versus Online Course Evaluations: A 'Consumer's Guide' to Seven Strategies_1

MERLOT Journal of Online Learning and Teaching Vol. 9, No. 1, March 2013
141
art "consumer's guide" to the evaluation of online and blended courses. Selecting the correct options can
potentially move formative, summative, and program decisions to a higher level of evaluation practice.
Status of Online Courses
The Pew Research Center's survey of U.S. colleges and universities found that more than 75% offer
online courses (Taylor, Parker, Lenhart, & Moore, 2011). More than 30% of all college enrollments in Fall
2010 were in online courses (Allen & Seaman, 2011) and nearly 9% of all graduate degrees in 2008 were
earned online (Wei et al., 2009).
The conversion of traditional F2F courses into either blended/hybrid combinations of F2F and online or
into fully online courses is increasing at a rapid pace along with enrollments in those courses. Further,
there is no sign that these trends are abating nationally (McCarthy & Samors, 2009) or internationally
(Higher Education Strategy Group, 2011). Distance education in all of its forms is the "course tsunami" of
the future. Everyone needs to be prepared.
Unfortunately, evaluation of these online courses and the faculty who teach them lags far behind in terms
of available measures, quality of measures, and delivery systems (Hathorn & Hathorn, 2010; Rothman,
Romeo, Brennan, & Mitchell, 2011). Although formative decisions based on student data for course
improvement can be conducted by the professor during the course using learning analytics, especially for
massive open online courses (MOOCs) (Bienkowski, Feng, & Means, 2012; Ferguson, 2012; van
Barneveld, Arnold, & Campbell, 2012), the overall commitment to online evaluation is lacking. A recent
survey of distance learning programs in higher education (Primary Research Group, 2012) in the U.S.,
Canada, and U.K. found that fewer than 20% of the colleges (15% U.S. and 37.5% Canada and U.K.)
have at least one full-time staff person devoted to evaluating the online distance-learning program.
Comparison of Face-to-Face and Online Courses
A brief review of the research on student ratings by Benton and Cashin (2012) and a more extensive
review on the evaluation of online courses by Drouin (2012) both came to the same conclusion: F2F and
online courses are more similar than they are different. They share several key "teaching" factors in
common. Lists of some of the common characteristics and the unique characteristics of online courses
are given next. Details can be found in the sources cited.
Common Characteristics
Drouin (2012) identified five criteria of "best practices" in online courses that she says could serve as
"best practices" in F2F as well for student, peer, and self- ratings. The categories of those criteria are: (1)
student–student and student–instructor interactions; (2) instructor support and mentoring; (3)
lecture/content delivery quality; (4) course content; and (5) course structure (p. 69). The differences lie in
the use of technology related to the delivery of content and social networking tools.
Unique Characteristics of Online Courses
In contrast to Drouin's criteria, Creasman (2012) extracted seven key differences in online courses (p. 2):
1) Asynchronous activity, where students can interact with each other and course materials
anytime, 24/7;
2) Non-linear discussions on message boards and forums, where students can participate in
multiple conversations simultaneously;
3) Communication primarily via written text;
4) Slower communication between instructor and students, primarily via e-mail;
5) Greater social contact and time spent by instructor with students on website;
6) Greater volume of information and resources available;
7) Instructor's roles as a facilitator, "guide on the side," and also co-learner.
So, what do these differences mean in terms of the scales used to measure teaching in online courses?
Can these differences be covered on new scales, or should current F2F scales be administered in online
courses? This is the problem with which the next section is concerned.

Face-to-Face versus Online Course Evaluations: A 'Consumer's Guide' to Seven Strategies_2

MERLOT Journal of Online Learning and Teaching Vol. 9, No. 1, March 2013
142
Seven Strategies to Evaluate Teaching Effectiveness in Online Courses
As online courses were being developed and following different models of teaching (Anderson & Dron,
2011; Creasman, 2012; Peltier, Schibrowsky, & Drago, 2007), existing traditional F2F rating scales were
challenged with regard to their application to these courses (Harrington & Reasons, 2005; Loveland,
2007). The F2F approach seemed efficient since many of those student rating scales were increasingly
being administered online at hundreds of institutions. However, these student ratings were just the
beginning; they are a necessary, but not sufficient, source of evidence to evaluate teaching effectiveness
in F2F courses (Berk, 2006, 2013). Other sources must also be used for employment decisions and
formative decisions of teaching and course improvement (Berk, 2005).
This online administration was being executed either by an in-house information technology system or by
an outside vendor specializing in online administration, analysis, and score reporting, such as
CollegeNET (What Do You Think?), ConnectEDU (courseval) EvaluationKIT (Online Course Evaluation
and Survey System), and IOTA Solutions (MyClassEvaluation). The choice of the course management
system was crucial in providing the anonymity for students to respond, which could boost response rates
(Oliver & Sautter, 2005). Most of the vendors' programs are compatible with Blackboard, Moodle, Sakai,
and other learning management systems.
Despite these online capabilities in place at many colleges and universities, it became apparent that F2F
measures might not address all of the essential components of online teaching (Loveland, 2007). This
validity issue seriously questioned the actual coverage of instructor behaviors and course characteristics.
Perhaps new measures are needed that are tailored to the specific features of those courses.
A review of the research and current practices in evaluating F2F and online courses suggests there are at
least seven options for measuring teaching effectiveness. Whatever options are chosen for formative,
summative, and program decisions, they must meet the design and technical standards of the Standards
for Educational and Psychological Testing (American Educational Research Association, American
Psychological Association, and National Council on Measurement in Education Joint Committee on
Standards, 1999), Personnel Evaluation Standards (Joint Committee on Standards for Educational
Evaluation, 2009), and Program Evaluation Standards (Yarbrough, Shulha, Hopson, & Caruthers, 2011).
A critique of these options follows:
1) Instructor-developed scale. Encourage instructor-developed scales to evaluate online teaching
and courses. Some institutions have placed the responsibility for evaluating the online course on
the individual instructor or simply neglect the evaluation (Compora, 2003). Unless instructors are
trained in the process of scale construction and score analysis and interpretation for formative or
summative decisions, this should not even be considered as a viable option.
Although the technology exists with free online survey providers such as Zoomerang and several
others (see Wright, 2005) to easily administer online course scales of up to 30 items per scale via
e-mail (Lip, 2008), it is not recommended. Further, after all that has been learned in the
evaluation of F2F courses, the complexity of multiple measures, such as student, self, peer,
administrator, and mentor rating scales, for formative and summative decisions cannot be
handled by each instructor. Online course assessment should not be the sole responsibility of the
instructor. There are much better ways to do it.
2) Traditional F2F student rating scale. Use the traditional student rating scale and other
measures that are currently in operation for the F2F courses. This is not an uncommon practice
for student scales (Beattie, Spooner, Jordan, Algozzine, & Spooner, 2002; Compora, 2003), but
may not be generalizable to self, peer, and other measures.
Studies using the same student rating scale in both types of courses yield comparable ratings on
several items, including course and instructor global items, on the IDEA Student Ratings of
Instruction form (Benton, Webster, Gross, & Pallett, 2010). Also, there were similar item means,
internal consistency reliabilities, and factor structures (McGhee & Lowell, 2003), and nearly
identical overall ratings of the instructor (Wang & Newlin, 2000).
This continuation of the F2F scale administration to all courses will not capture elements that are
unique to each type of course as well as the specific emphases and concentration in delivery
methods and technology that may be especially useful for course design and improvement.

Face-to-Face versus Online Course Evaluations: A 'Consumer's Guide' to Seven Strategies_3

End of preview

Want to access all the pages? Upload your documents or become a member.