Tools for Interpreting SRTI Results in Context

Section Report Comparison Means

Whatever your approach to summative evaluation, because of the imprecise nature and positive response bias inherent in student ratings of instruction, we include comparison group means on the SRTI Section Report to help provide a context for interpreting results for an individual instructor (Figure 1). Item means are provided at the program/department, school/college, and campus level for all courses at the same level (undergraduate or graduate) and in the same enrollment category as the rated section (e.g., undergraduate sections with 120 or more enrolled). The comparison group means are calculated using combined SRTI results for the previous two academic years and are only reported for groups that have 10 or more sections. 

We also include an interval measurement around each instructor mean rating called a “credible interval”. Conceptually similar to a confidence interval, the 90% credible interval is based on a different set of statistical assumptions that are more appropriate for SRTI data than traditional confidence interval construction.

Figure 1: SRTI Section Report Page Two (Excerpt) - Mean Comparisons Within Class Size

What to Note

  • Do not treat comparison group results as an absolute standard or line of demarcation between “passing” and “failing” instructor performance.
     
  • Also avoid using trivial differences in mean scores to rank or compare the teaching performance of individual instructors or when comparing results for an individual instructor to a comparison group. The credible interval around each instructor mean provides the true range of possible scores and helps determine whether an observed difference in SRTI means is one worthy of attention. Additional details on the construction and interpretation of the credible interval can be found in the following report: Interpreting Ratings in Context: The Credible Interval.

Student ratings of instruction are best at distinguishing the extremes - that is, instructors whose ratings are well below or well above average. They lack the precision for ranking or making fine distinctions among the majority of instructors whose ratings fall somewhere in the middle. Figure 2 shows the distribution of means for all undergraduate sections with five or more respondents on global item 12, “What is your overall rating for this course.”  Notice the middle 40 percent of sections have means somewhere between 3.7 and 4.3. Figure 2 also demonstrates the positive response bias typical in student ratings. Though the midpoint of the 5-point response scale is 3.0, “About average”, this should not be interpreted as an “average” rating; 70% of courses had a mean rating of 3.4 or higher, for an average rating of 3.9, almost a full point above the item midpoint.