Student Evaluations: The Ratings Game

by John V. Adams

from Inquiry, Volume 1, Number 2, Fall 1997, 10-16

Copyright 1997 Virginia Community College System

Return to Volume 1, Number 2


Abstract

Research reveals widespread misuse of student evaluations. Student evaluations, the author argues, should be used for improvement of instructor’s teaching. Fifteen less-than-accurate variables related to high student evaluations are detailed.

In recent years, student evaluations of faculty performance have become increasingly common on college campuses across the nation. In a study that tracked the use of student evaluations of faculty in 600colleges between 1973 and 1993, Seldin (1993) found that their use increased from 29 percent to 86 percent during that period. However, along with their increased use, student evaluations of faculty have also become increasingly controversial. The reason for the controversy is not the student evaluations themselves, but rather the way the student evaluations are often used. As Abrami, d'Apollonia, and Cohen (1990) point out, “Student ratings are seldom criticized as measures of student satisfaction with instruction . . . . Student ratings are often criticized as measures of instructional effectiveness” (p. 219). On many college campuses, administrators use the results summatively as a major determiner for making critical retention, promotion, and merit pay decisions about individual faculty members.

For administrators, the attractiveness of student evaluations of faculty is that they provide an easy, seemingly objective assessment of teaching that does not require justification (Stone, 1995). The ease of student evaluations comes in reducing the complexities of teaching performance to a series of numbers, particularly when commercial forms are used. The most common type of commercial student evaluation form utilizes a Likert-type scale for students to rate faculty related to a series of statements about the course and instruction. Each point on the scale is assigned a numerical value which allows the computation of composite scores for individual items, groups of items, or all of the items. Finally, the student ratings are often normed nationally and locally in spite of the near universal recommendations in the literature against the norming of student ratings (Callahan, 1992; Rifkin, 1995). One of the problems with this type of student evaluation of faculty is that few administrators are trained to interpret these numbers. It is not uncommon for administrators to eye-ball the scores and assume that scores below the mean are bad and those above it are good. Their reasoning seems to be based on the impossible assumption that all of their faculty members should be above average in all categories (Klajman, 1997). It is also not uncommon for administrators to pick-and-choose among the various scores of a particular faculty member in order to justify a previously held opinion.

The original purpose of student evaluations of faculty, however, was not summative but formative — that is, for the improvement of instructors' teaching (Blunt, 1991). In spite of this original intent, there is no evidence to support the notion that student evaluations of faculty actually improve instruction. The reason for the lack of impact on teaching is simple. In order to improve instruction, the evaluation device has to identify particular difficulties (Neal, 1988). At best, student evaluations of faculty might point to broad areas of concern such as faculty/student interaction, but they do not identify the cause of the weakness. Without a knowledge of the specific cause, it is difficult to suggest what kind(s) of changes should be made. Any changes in teaching performance could just as easily make matters worse.

An additional reason for the lack of a positive impact of student evaluations of faculty on the quality of instruction is that a clear definition of effective teaching has never been developed. Teaching, as with art, remains largely a matter of individual judgment. Concerning teaching quality, whose judgment counts? In the case of student judgments, the critical question, of course, is whether students are equipped to judge teaching quality. Are students in their first or second semester of college competent to grade their instructors, especially when college teaching is so different from high school (Lechtreck, 1990)? Are students who are doing poorly in their courses able to objectively judge their instructors? And are students, who are almost universally considered as lacking in critical thinking skills, often by the administrators who rely on student evaluations of faculty, able to critically evaluate their instructors? There is substantial evidence that they are not. For example, a study of the correlation between students' evaluations of faculty and their own self-ratings indicated that ratings of faculty may be a better measure of the rater's schemata than a measure of the instructor's performance (Tang, 1987). In other words, as Spinoza commented, “What Peter tells us about Paul tells us more about Peter than Paul” (Goldman, 1993, p. 59).

As previously stated, the main problem with student evaluations of faculty is in how they are used. When student evaluations of faculty are used summatively to determine retention, promotion, and merit pay, there is the potential for serious consequences in the classroom. This summative use of student evaluations of faculty places students in the position of being the primary determiner of what happens in the classroom. Faculty members are quick to realize that those who elicit satisfaction from their students, by whatever means necessary, are rewarded by the administration, while faculty members who elicit dissatisfaction, no matter what the reason, may suffer the consequences and typically have to defend themselves (Stone, 1995). It does not require a complex economic theory to see how student evaluations of faculty might encourage faculty accommodations to students. “The fact that learning has declined and stagnated during the twenty-five or so years that higher education has relied on student opinion as a measure of 'good' teaching speaks for itself” (Stone, 1995, p. 13).

Whether student evaluations of faculty are used for summative or formative purposes, the fact remains that the reliability, validity, and utility of student evaluations of faculty have yet to be established (Abrami, 1990). A review of the literature reveals conclusions about student evaluations of faculty that range fromcalls for caution in their use, to questions about the legal basis, to outright rejection. With regard to calling for caution, most researchers advise that, if used, student evaluations of faculty should be only one of many sources of information on teaching performance (Cashin, 1988; Seldin, 1993; Blunt, 1991; Haskell, 1997). “We fail ethically when we permit important personnel decisions to proceed on the basis of such potentially misleading data” (McKeachie & Kaplan, 1996, p. 7).

Concerning questions about the legal basis of student evaluations of faculty, Lechtreck (1990) points out that, “In the past few decades, courts have struck down numerous tests used for hiring, and/or promotions on the grounds that the tests were discriminatory or allowed the evaluator to discriminate. The question, 'How would you rate the teaching ability of this instructor,' is wide open to abuse” (p. 298). In his column, “Courtside,” Zirkel (1996) states, “Courts will not uphold evaluations that are based on subjective criteria or data” (p. 579). Administrative assumptions to the contrary, student evaluations of faculty are not objective, but rather, by their very nature, must be considered subjective.

Finally, in rejecting the use of student evaluations of faculty, Cahn (1987) writes, “To send such inane data to faculty members with the understanding that their scores will play a significant role in the consideration of their reappointment, promotion, or tenure is demeaning, not only to an illustrious institution but also to professors, whose informed evaluations are supposed to lie at the heart of the educational process” (p. B2). Cholakian (1994) is just as emphatic when he states that through student evaluations of faculty, “. . . we are implicitly teaching students to deflect responsibility [for their own learning from themselves to the instructor] ... Are we improving the profession or appeasing the Philistines?” (p. 26 ).

If student evaluations of faculty are a questionable measure of teaching effectiveness, then what do they measure? That is a difficult question to answer. Most researchers would agree that student evaluations of faculty are a general measure of student satisfaction (Abrami, 1990). Supporters of student evaluations of faculty might argue that student satisfaction is an important aspect of teaching and that any rejection of the concept comes from a lack of desire to meet the needs of the students (Baldwin, 1994). However, as Baldwin (1994) also points out, related to student satisfaction or, what he terms, the notion of the student as a customer, “Care for the customer extends only as far as doing whatever is necessary to ensure that they will buy one's product and keep buying it. The motivation is profit; there is no moral dimension involved . . . . In the end, the construction of teaching as an activity which is conducted for profit will result in bad teaching” (p. 136).

Since student evaluations have yet to be shown to be valid measures of teaching quality but are, nevertheless, often used by administrators to make critical decisions concerning the retention, promotion, and merit pay of faculty members, it is only reasonable to expect teachers to do what they can to achieve the highest possible ratings. In response to this, Troy (1995) stated, “Instructors know that in each class some student will say, 'What do I need to do to get an A?' There's no reason the instructor can't say something similar to the students” (p. 1). For faculty members who prefer to be less obvious, the following is a list of fifteen variables related to high student evaluations.

1. Difficulty of first test: “Students tend to evaluate instructors near the end of the course according to their performance on the first examination” (Hewett, 1988, p. 8).

2. Students’ Interest: “...students' self-reported sleepiness in the class was negatively correlated with the ratings of instructor's ability to explain material clearly and understandably” (Tang, 1987, p. 93).

3. Student participation: “Participation in classroom discussions was significantly related to the students' ratings of the professor's ability to stimulate students' interest in the subject matter, willingness to talk with students outside of class, and knowledge in this class” (Tang, 1987, p. 93).

4. Discipline taught: “Finally, for this assembly, perhaps the most interesting evidence of validity is that humanities teachers are rated as being more effective than teachers of science, math, and engineering” (McKeachie, 1996, p. 5).

5. Student satisfaction: “It must be recognized that one or two disgruntled students, or students who do not find the teaching style compatible to their learning styles can produce a significant difference in the final rating score” (Callahan, 1992, p. 101).

6. Purpose of student evaluation: “...the mean rating was higher when the student was told the purpose [of the evaluation] was for promotion of the instructor.” “The results of this study agree with the findings of Driscoll and Goodwin, Aleamoni and Hexner, Sharon and Bartlett, and Taylor and Wherry which show that certain types of information tend to elevate or increase average student ratings” (Douglas & Carroll, 1987, pp. 364-365).

7. Socialization with instructor outside of class: “Socializing with students outside of class improved a female instructor's SRI, but social contact did not affect the ratings given to male instructors” (Kierstead, D'Agostino, & Dill, 1988, p. 344).

8. Perceived friendliness of instructor: “Smiling slightly depressed ratings given to male instructors, but it elevated those given to female instructors” (Kierstead, D'Agostino, & Dill, 1988, p. 344).

9. Grades received: “The biasing effect of grades on evaluations has been clearly demonstrated in a variety of experimental studies” (Cahn, 1987, p. B2) “Generally class evaluations are higher in classes with higher student grades” (Shapiro, 1990, p.137). “In other words, students satisfied with their grades took credit for it; students dissatisfied with their grades blamed the instructor” (Benz & Blatt, 1996, p. 428).

10. Story telling: “If it is the case that story-telling is what it means to 'be interesting,' as we concluded in this study, then faculty may want to apply this understanding to their teaching, i.e., tell stories.” (Benz & Blatt, 1996, p. 429).

11. Required course: “In general, as the data show, required courses hold less interest and receive lower evaluations than elective courses” (Haskell, 1997, p. 7).

12. Use of textbook: “The present study showed that reading the textbook before coming to the class was positively associated with their ratings of the instructor's knowledge of the subject matter and students' knowledge in this class. Reading textbook after the class was associated with students' ratings of the instructors' clarity of grading criteria and knowledge gained in this class” (Tang, 1987, p. 93).

13. Teacher behaviors: “Our findings also concur with those of Bennett (1982), who found that women are more negatively evaluated than men if they fail to meet gender-appropriate expectations with regard to student contact and support, and that students do not necessarily appreciate men who give them greater time and attention.” “In particular, our results indicate that male and female instructors will earn equal SRIs for equal professional work only if the women also display stereotypically feminine behavior” (Kierstead, D'Agostino, & Dill, 1988, p. 344).

14. Term paper requirement: “In this study, the only factor concerning class assignments or grading criteria that affected students' overall evaluations was whether or not a term paper was required. Students' evaluations were higher when a term paper was required (Shapiro, 1990, p. 147).

15. Teacher delivery: “For two semesters at Cornell University, Stephen J. Ceci taught his developmental psychology course in exactly the same way—with one exception. The second time, he spoke more enthusiastically, varied his pitch, and used more gestures. The result was a major improvement in student ratings of his course.” Researchers “were stunned to find that Dr. Ceci had earned much better scores in the second semester for his level of knowledge, organization, fairness, and even the quality of the textbook. Yet student performance on the tests was about the same in both courses” (Chronicle, 1997, p. A10).

The implication of the above research is clear. While student evaluations of faculty performance are a valid measure of student satisfaction with instruction, they are not by themselves a valid measure of teaching effectiveness. If student evaluations of faculty are included in the evaluation process of faculty members, then they should represent only one of many measures that are used.


References

Abrami, P.C., d’Apollonia, S. & Cohen, P.A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Education Psychology, 82(2), 219-231.

Baldwin, G. (1994). The student as customer: The discourse of “quality” in higher education. Journal for Higher Education Management, 9(2), 131-139.

Benz, C.R. & Blatt, S.J. (1996). Meanings underlying student ratings of faculty.

Review of Higher Education, 19(4) 411-433.

Blunt, A. (1991). The effects of anonymity and manipulated grades on student ratings of instructors. Community College Review, 18 (Summer), 48-53.

Cahn, S. (1987). Faculty members should be evaluated by their peers, not by their students. Chronicle of Higher Education, (October 14), B2.

Callahan, J.P. (1992). Faculty attitude towards student evaluation. College Student Journal, 25 (March), 98-102.

Cashin, W.E. (1988). Student ratings of teaching: A summary of the research. IDEA Paper No.20. ERIC NO: ED302567.

Cholakian, R. (1994). The value of evaluation. Academe, 80 (5) 24-26.

Chronicle of Higher Education. (1997). (March 14), A10.

Douglas, P.D. & Carroll, S.R. (1987). Faculty evaluations: are college students influenced by differential purposes. College Student Journal, 21 (Winter) 360-365.

Goldman, L. (1993).  On the erosion of education and the eroding foundations of teacher education (or Why we should not take student evaluations of faculty seriously). Teacher Education Quarterly, (Spring) 57-64.

Haskell, R.E. (1997). Academic freedom, tenure, and student evaluations of faculty: Galloping polls in the 21st century. Education Policy Analysis Archives, 5(16), 1-36. Available: http://olam.ed.asu.edu/epaa/v5n6.html

Hewett, L. & And Others. (1988). Course evaluation: Are students’ rating dictated by first impressions? ERIC NO: ED296664. 3-13.

Kierstead, D., D’Agostino, P., & Dill, H. (1988). Sex role stereotyping of college professors: Bias in students’ ratings of instructors. Journal of Educational Psychology, 80, 342-344.

Klajman, G. (1997, February). Nightmares of academic assessment. ASSESS - Assessment in Higher Education. Available: ASSESS@LSV,UKY.EDU.

Lechtreck, R. (1990). College faculty evaluation by students - An opportunity for bias. College Student Journal, 24 (September), 297-299.

McKeachie, W.J., Kaplan, M. (1996). Persistent problems in evaluating college teaching. AAHE Bulletin, (February), 5-9.

McKeachie, W.J., (1996). Student ratings of teaching.  The Professional Evaluation of Teaching (ACLCS Occasional Paper NO. 33) 2-7 Available: http://www.acls.org/op33.html.

Neal, J.E. (1988). Faculty Evaluation: its purposes and effectiveness, ERIC Digest. Available: http://vmsgopher.cua.edu:70/0gopher_root_eric_ae%3a[_tessay]faceval.TXT.

Rifkin, T. (1995) ERIC Review: Faculty evaluation in community colleges. Community College Review, 23 (Summer), 63-72.

Seldin, P. (1993, July 21). The use and abuse of student ratings of instruction. The Chronicle of Higher Education, A-40.

Shapiro, Gary E. (1990).  Effect of instructor and class characteristics on student’ class evaluations. Research in Higher Education, 31, 135-148.

Stone, J.E. (1995, June). Inflated grades, inflated enrollment, and inflated budgets: An analysis and call for review at the state level. Education Policy Analysis Archives, 3(11), 1-33. Available: http://olam.ed.asu.edu/epaa/v3n11.html.

Tang, T.L., & Tang, T.L. (1987). A correlation study of student’s evaluations of faculty performance and their self-ratings in an instructional setting. College Student Journal, 21 (Spring), 90-97.

Troy, M. (1995). Changing the evaluation culture. Available:
http://marsquadra.tamu.edu./TIG/FacultyEvalArticles. ChangingtheEvaluationCulture.ht.

Zirkel, P.A. (1996). The law or the lore. Phi Delta Kappa, (April) 579.


John V. Adams is Associate Professor of English at Southside Virginia Community College, Christanna Campus.  He earned his doctorate in English education at the University of Illinois in Champaign-Urbana. His teaching career began in Ethopia while he was serving in the Peace Corps.  He has since taught at colleges and universities in Canada, Spain, and Puerto Rico as well as in the continential United States.