Evaluation examples | Buros Center for Testing

Community Setting.

A school district is considering whether to adopt the performance assessments that come as supplemental tests to a standardized test. Here are some of the factors that the district must seriously consider as it reviews different options:

In this community, 28% of the students do not have English as a first language. The district must look carefully at the requirements of the different performance assessments to be sure that the language of the directions to students and the performance assessment tasks do not place its limited English-speaking students at a disadvantage.
The teachers in this district have not had any in-service training on the use of alternative or performance assessments. They are likely to be unprepared to administer and reliably mark the performance assessment results. The district will need to educate its teachers about the advantages of performance assessment on a rather large scale.
The district has a strong teachers' union. The marking of performance assessments by teachers will require extra work outside of the normal time allotted for teaching and class preparation. The district will need to discuss the extra marking time with the union representatives to be assured that labor equity will be reached.
The current curriculum in the district does not have a performance assessment component; the teachers do not use an activity-based approach to pupil learning. Thus, the performance assessments would not match the local curriculum. Curriculum changes may need to be made so students will be prepared to be assessed by the "new" type of tests. Parents and teachers will have to be educated on this matter.
The current budget in the district allots $5.50 per student for assessment purposes. This is sufficient to cover the district's normal standardized testing program. The performance assessment will add more costs both for materials (which cannot be used the following year) and for the extra compensation teachers will be given to mark the results.

Specific Decisions, Purposes, and Uses.

A school district is considering whether to adopt a diagnostic reading test to help students who have scored below the acceptable level in the state's reading assessment. Among the issues the district needs to consider are the following:

The district's fifth and sixth grade students have performed especially poorly in reading comprehension on the mandated state reading assessment. The state standards list several specific comprehension standards such as:
- Predicting the content in a text by examining text features (e.g., headings, illustrations) and using prior knowledge of the topic of the text.
- Using reading strategies such as drawing inferences and looking for causes and effects to understand the text.
- Identifying how an author's writing techniques (such as plot and use of figural language) help one understand the meaning of the text.
There is a long list of these specific skills that the district needs to address. Those selecting a diagnostic test will need to be assured that the selected test assesses students on as many of these target skills as possible and that the test will provide a reliable report about each fifth and sixth grade student's mastery of these specific skills.
The district wants teachers to use the results of the diagnostic test to provide teachers with guidance for developing remedial reading instruction on the specific skills the students have not mastered. This implies the teachers either already know how to provide this instruction as part of their normal classroom reading instruction, that there will be some teacher education and curriculum changes to implement the remediation, or that a special reading teacher will be hired to implement the remediation outside of the normal classroom.

How the Test Scores Will Improve the Current Situation.

A school district wants to use a diagnostic reading test to help teachers focus on the reading deficits of elementary school children, especially those reading skills listed in the state standards. The skills are measured by the state's mandated reading assessment. Students' performance on this assessment will be one of the criteria used by the state's department of education to judge whether the school is performing adequately. The district expects the following to happen once an appropriate test is adopted and in use:

Each year the district's teachers will have the information they need to teach students who currently are falling behind in reading according to the state mandated test.
All teachers will have the necessary information in an understandable format that is helpful to their teaching and will receive it in time for them to adapt their classroom teaching based on the diagnostic test results.
After being identified and having been taught, almost all of the district's elementary students will meet or exceed the reading standards set by the state for their grade level.
The school district's scores on the state-mandated reading test would improve from year to year until almost all students score at or above the minimally acceptable level for their respective grade.
On the whole, elementary students will be better readers and learn more in other subjects such as science and social studies than had occurred in the past.

As the district reviews and evaluates different tests, it should keep these purposes in mind and try to anticipate how well these tests could be used to achieve the above results.

Strengths and limitations of different testing formats.

A district wants to use performance assessments in elementary school mathematics and science as a way to encourage teachers to refocus their teaching to be more activity-based and student-centered. The district wants to drop the multiple-choice mathematics and science tests it currently uses to annually monitor students.

Strengths of the performance assessment include:

Performance tasks clarify the meaning of complex learning targets. Realistic performance assessment tasks have the potential to match complex learning objectives to a close degree. When the school district presents the tasks to students and shares them with parents, it makes the district's learning goals clear through actual example.
Performance tasks assess the ability "to do." One of the district's important learning outcomes is the students' ability to use their knowledge and skill to solve problems and to lead a useful life, rather than to simply answer questions about mathematics or science.
Performance assessment is consistent with the learning theory endorsed by the district. The district's curriculum leaders subscribe to a learning theory that emphasizes students should use their previous knowledge to build new knowledge structures, be actively involved in exploration and inquiry through task-like activities, and construct meaning for themselves from educational experience. Performance assessments have the potential to engage and actively involve students with complex tasks.
Performance tasks require integration of knowledge, skills, and abilities. Complex performance tasks, especially those that span longer periods of time, usually require students to use combinations of different skills and abilities.
Performance assessments may be linked more closely with the district's teaching activities. The district would like its teachers to require students to be actively involved in inquiry and performance activities. Performance assessments are a meaningful component to this teaching approach.
Performance tasks broaden the approach to student assessment. If the district introduces performance assessment along with traditional multiple-choice formats, it broadens the types of learning objectives it assesses and offers students a variety of ways of expressing their learning. As a result, it would increase the validity of the district's student evaluations.
Performance tasks let teachers assess the processes students use as well as the products they produce. Many performance tasks offer teachers the opportunity to watch the way a student goes about solving a problem or completing a task. Appropriate scoring rubrics help the teachers to collect information about the quality of the processes and strategies students use, as well as to assess the quality of the finished product.

Disadvantages of the performance assessment include:

High-quality performance assessments are difficult to create. Good performance assessments match complex learning targets. When the district examines the performance task available from publishers, it may find that the tasks fall short of requiring students to use the complex thinking required by the district's curriculum.
High-quality scoring rubrics are difficult to prepare and use. This is especially true when the district wants to assess complex reasoning ability and will permit students to have multiple correct answers and products. A publisher may provide good quality performance tasks but provide poor quality or very ambiguous scoring rubrics. Such rubrics could easily invalidate the students' scores from the performance assessment. The district will need to study very carefully the rubrics a publisher provides to be sure they permit scoring students' achievement of complex thinking skills. The district should not assume that because the tasks are appropriate that the rubrics will be appropriate, too.
Completing performance tasks takes students a lot of time. Even short on-demand paper-and-pencil tasks take 10 to 20 minutes per task to complete. Most realistic tasks take days or weeks to complete. If the school district's assessments are not part of its instructional procedures, this means either administering fewer tasks (thereby reducing the reliability of the results) or reducing the amount of instructional time.
Scoring students' responses to performance tasks takes a lot of time. The more complex the performance and the product, the more time teachers can expect to spend on scoring. The district can reduce scoring time by being sure to select products having high-quality scoring rubrics. holistic scoring is quicker than analytic scoring.
Scores from performance tasks may have lower scorer reliability. With complex tasks, multiple correct answers, and fast-paced performances, scoring depends on the teachers' scoring competence. If two teachers use different frameworks, have different levels of competence, use a different scoring rubric, or use no scoring rubrics at all, they will mark the same student's performance or product quite differently. Inconsistent scoring lowers the reliability and validity of the assessment results. The district can obtain high levels of scorer reliability, however, if it trains teachers to use the same well-defined rubrics and monitors them so they don't drift away from the standards set in the rubrics.
Students' performance on one task provides little information about their performance on other tasks. A serious problem with performance assessments is that a student's performance on a task very much depends on her prior knowledge, the particular wording and phrasing of the task, the context in which it is administered, and the specific subject-matter content embedded in the task. This results in low reliability from the content-sampling point of view. In other words, the district may have to use six or seven performance tasks to reliably evaluate a student in one unit of instruction. The validity of the district's assessment results may also be low if it does not use enough assessment tasks.
Performance tasks do not assess all learning targets well. If some science or mathematics learning objectives focus on memorizing and recalling facts, definitions of special terminology, or the meaning of certain theoretical ideas, then the objective formats of items (short-answer, multiple-choice, matching, and true-false) are better assessment choices than performance assessments. If the district's learning objectives emphasize logical thinking, understanding concepts, or verbal reasoning, objective formats may still be a better choice than performance formats. The objective formats allow a much broader coverage of content and can assess that broader coverage in a shorter period of time. Further, objective formats are easier to score and the results from them are more reliable. A balanced multiple assessment approach, using both performance and objective formats, is usually recommended.
Completing performance tasks may be discouraging to less able students. Complex tasks that require students to sustain their interest and intensity over a long period of time may discourage less able students. They may see the high standards implied by such tasks as beyond their reach. They may have partial knowledge of the learning target but may fail to complete the task because it does not allow them to use or express this partial knowledge effectively.
Performance assessments may underrepresent the learning of some cultural groups. Although performance assessments allow the opportunity for students to use their backgrounds in diverse ways and allow multiple correct solutions, the district may find it difficult to purchase tasks--and especially scoring rubrics--to take advantage of this diversity. If the district's teachers are not knowledgeable about how different cultural groups express their higher thinking skills, they may systematically bias their assessments of them.
Performance tasks will not wash away differences among cultural groups. They are likely to make such differences more apparent. Multiple assessment formats may improve this situation somewhat because they allow knowledge, skills, and ability to be expressed in different formats and media.
Performance assessments may be corruptible. As the district's teachers use performance assessments, they will teach their students how to do well on them. This amounts to coaching them how to perform well on specific testing formats (often called "teaching to the test"). If this coaching amounts to teaching all aspects of the state's standards and the school district's curriculum learning objectives, the teachers are doing the right thing. However, if the teachers focus primarily on only one aspect of the learning objectives (e.g., a standard way to write answers to constructed-response science items), they will lower the validity of the results. Coaching tends to reduce the novelty of a task or change it from a "higher order thinking" task to a "following-the-solution-strategy-the-teacher-taught-me" task. This reduces the validity of the results because assessments using the coached tasks do not evaluate the main intent of those learning objectives that want students to solve new and ill-structured problems. The quality of a student's education is thereby reduced.

Performance tasks may have the potential for improving teaching in the district, something that the district clearly had in mind when it decided to search for performance assessments to purchase. Here are some ways that the district's use of performance assessments may help to improve teaching:

When teachers require students to complete a well-crafted performance task, they give them the opportunity to apply their learning to a new situation. This shows students that learning must not be limited to repeating what the teacher said.
If the district chooses the right kind of performance tasks, administering them may help students make connections between the skills and abilities they learned in separate subjects. For example, students must integrate skills and abilities in language arts, mathematics, science, and social studies when they conduct, analyze, and write up a survey of students' opinions in their school.
If the district uses performance tasks frequently, it may help students to realize the connections between "schoolhouse" learning and "real-world" activities. This realization is likely when performance tasks are very similar to the tasks people use in real life (planning a trip using a map, using a bus timetable, and making a travel budget) or tasks that involve current events (comparing local politicians' points of view as expressed in speeches, advertisements, and daily newspapers).
If the district shares the scoring rubrics with students, it helps to clarify the learning objectives for them. The more students understand the skills and abilities they should use, the more able they become in identifying where they should focus their practice and study efforts. It is important, however, that the district does more than simply distributing the rubrics to students. It must assure that the teachers teach the meaning of the rubrics and give students supervised practice in using them.