Organization of Test Reviews for the Mental Measurements Yearbook Series | Buros Center for Testing

When preparing your review, please use the following sections to organize the review. Using the same structure across reviews makes it easier for readers to follow and allows them to make comparisons among different tests they may be considering for use.

Each review is to include the following five sections:

Description
Development
Technical
Commentary
Summary

Description of Sections

The total length of a typical review should be 1000-1600 words. The following paragraphs describe the kinds of information that reviewers usually include when writing each section of their reviews.

1. Description

In this section, a general description of the test is given and usually includes the purposes of the assessment, the target population, and the intended uses of the test. In addition, information about administration of the test should be summarized along with information about the scores and scoring procedures.

2. Development

Information in this section reviews how the test was developed, what underlying assumptions or theory guided the decisions about how to define the construct, and details on item development and refinement. Discuss results of pilot testing in this section. In addition, the reviewer might comment on any steps undertaken in the selection of the final set of items for the test and any evaluations of the appropriateness of these items for measuring the construct(s) of interest.

3. Technical

This section can be divided into three subsections: standardization, reliability, and validity evidence. Subheadings may be used, but are not required. In addition, test fairness should be addressed by presenting and evaluating evidence offered regarding accessibility, a feature of tests that helps assure all test takers have an unobstructed opportunity to demonstrate their standing on the construct of interest. Fairness considerations reflect efforts to minimize construct-irrelevant variance and apply to all steps in the testing process—test design and development, validation, administration, scoring, and score interpretation. Treatment of test fairness may occur in the Development and/or Commentary sections, instead of or in addition to the Technical section.

In describing standardization, present information about the standardization, development, and/or norm sample/s, including how well the samples match the intended population. Appropriateness of the norms for all individuals in the target population, including individuals from specific subgroups of test takers, should be discussed. These diverse subgroups include those defined by such dimensions as race, ethnicity, culture, age, gender, as well as socioeconomic and disability status, among other individual characteristics.

Regarding reliability, evidence for score consistency is presented. The types of reliability estimates and their magnitudes should be presented in a summary fashion. Brief comments about the acceptability of the levels of reliability, the sample used for determining these estimates, and related issues are pertinent to this section. More extensive treatment of reliability concerns may occur in the Commentary section.

In discussing validity, present evidence supporting interpretations and potential uses of test results. Studies designed to gather evidence of valid uses of test scores should be summarized. Information about test content and the adequacy of testing measures of the intended construct also should be presented. If the test is intended to be used to make classifications or predictions, evidence supporting these uses should be described in this section. In addition, reviews should examine the differential validity of the test across subgroups that are included in the intended population (e.g., gender and racial subgroups) and should address differential item functioning if not addressed elsewhere in the review. Brief comments about the acceptability of the evidence presented to support test score interpretation and use belong in this section. Extensive commentary concerning validity evidence should occur in the Commentary section. Consistent with current measurement standards, a test itself is not deemed "valid." Rather, validity is appraised in terms of the uses of test scores and how well test results meet the intended purposes of the test.

4. Commentary

This section provides an opportunity for the reviewer to address the overall strengths and weaknesses of the test. The adequacy of the theoretical model supporting test use should be examined, together with the impact of current research evidence on the test's assumptions. The reviewer should indicate the extent to which the evidence presented in the test materials and scholarly research support the use of the test with underrepresented or marginalized groups. If another test should or might be considered for use, that test may be listed, cited, and referenced.

5. Summary

In no more than six or seven sentences, the reviewer is to offer conclusions about the overall quality of the test and recommendations regarding its use. The summary should be as concise and explicit as possible.

Read a sample review.