Sample Reviews from the Mental Measurements Yearbook

Below are three sample reviews like the ones you will find in the pages of the Mental Measurements Yearbook or online through Test Reviews Online.

Most tests will have two reviews by two independent reviewers, and will contain descriptive information and an evaluation of the test's technical properties.

All Mental Measurements Yearbook test reviews are copyrighted by the Buros Center for Testing. Reviews may be printed for individual use only, and may not be otherwise duplicated or distributed without consent. Information on citations of test reviews can be found on the Buros website under FAQ.

Behavioral Assessment of Pain-2 Questionnaire
Purpose: An assessment tool to understand "factors which may be working to exacerbate and/or maintain subacute and chronic nonmalignant pain."
Population: Subacute and chronic pain patients.
Publication Dates: 1990-2009.
Acronyms: BAP-2, Post BAP-2.
Scores, 51: 15 scales, 35 subscales, plus Disability Index: Pain Behavior Scale (Affective/Behavioral, Audible/Visible), Pain Descriptor Scale (Pulling, Tight and Dull [PTD], Sore, Aching, and Tender [SAT], Throbbing and Sharp [TS]) Activity Interference Scale (Domestic/Household Activities, Heavy Activities, Social Activities, Personal Care Activities, Personal Hygiene Activities), Avoidance Scale, Spouse/Partner Influence Scale (Reinforcement of Pain, Discouragement/Criticism of Pain, Reinforcement of Wellness, Discouragement/Criticism of Wellness), Physician Influence Scale (Physician Discouragement/Criticism of Pain, Physician Reinforcement of Wellness, Physician Discouragement/Criticism of Wellness, Physician Reinforcement of Pain), Pain Beliefs Scale (Catastrophizing, Fear of Reinjury, Expectation for Cure, Blaming Self, Entitlement, Future Despair, Social Disbelief, Lack of Medical Comprehensiveness), Perceived Consequences Scale (Social Interference, Physical Harm, Psychological Harm, Pain Exacerbation, Productivity Interference), Mood Scale (Depression, Muscular Discomfort, Anxiety, Change in Weight), 6 validity scales.
Administration: Individual.
Forms, 2: Behavioral Assessment of Pain-2 Questionnaire, Post Behavioral Assessment of Pain-2 Questionnaire.
Price Data, 2012: $21 per computer-generated clinical report from BAPTrax Software (volume discounts available); $25 per prepaid mail-in answer premium full-service reports (volume discounts available).
Foreign Language Edition: Spanish and French editions (unnormed) available.
Time: (30) minutes or less per test.
Comments: Self-report instrument; two options for generating a clinical profile: via BAP-2 software or mail/fax service; Post BAP-2 analyzes the changes over the course of treatment providing outcome data for the pain program.
Authors: Michael J. Lewandowski and Blake H. Tearnan.
Publisher: Pain Assessment Resources.
Review of the Behavioral Assessment of Pain-2 Questionnaire by ASHRAF KAGEE, Professor, Department of Psychology, Stellenbosch University, Stellenbosch, South Africa:

DESCRIPTION. The Behavioral Assessment of Pain-2 Questionnaire (BAP-2) is a self-report assessment instrument that is used, according to the test publisher, to identify the factors that may "exacerbate and/or maintain subacute and chronic non-malignant pain." The BAP-2 contains 223 test items that are presented at an eighth-grade reading level. The scale is an updated version of the Behavioral Assessment of Pain Questionnaire (BAP).

The test authors indicate that the BAP-2 takes less than 30 minutes to complete. It may be administered by an untrained assistant or secretary. There are two ways to generate a clinical profile using the BAP-2. A software package is available, and scores may be entered manually. A 23-page profile report may then be printed on site. Alternatively, answer sheets may be mailed or faxed to Pain Assessment Resources for scoring. The clinical profiles are then mailed back on the same day. The latter option may save considerable time in a busy clinical practice. A Post BAP-2 also is available so that comparisons may be made between the patient's profile before and after treatment.

The summary page of the profile report provides both raw and uniform T-scores for each scale, information that assists the clinician in interpreting the test results. This standardization is useful information as the T-scores permit comparisons of the scores of each scale with one another, which would not be possible if only raw scores were used. The BAP-2 has been normed on samples of subacute and chronic patients; thus, a T-score of 50 represents the average of patients who experience some level of pain and significant levels of impairment. T-scores above 50 should be regarded as highly significant according to the test materials. In the clinical profile generated from the scores, the respondent's mean scores may be compared with normative sample means.

DEVELOPMENT. The development of the original version of the BAP was reported by Tearnan and Lewandowski in 1992. Scale development occurred in two phases. In Phase 1 the questionnaire was developed and administered to a large number of patients experiencing chronic pain. Item and factor analysis followed, and items with low internal consistency with other items or low factor loadings were omitted. In Phase 2, the revised questionnaire was administered to a separate sample of 326 chronic pain patients, and the psychometric properties were calculated. The BAP-2 is shorter than the original BAP, which contained 390 items. Most of the original scales (e.g., Activity Interference, Activity Avoidance, Spouse/Partner Influence, Pain Beliefs, Perceived Consequences) have been retained, and six validity scales and a medication scale have been added.

The BAP-2 is rooted in the biopsychosocial model and, to this extent, the scales represent all three domains of the model, with an emphasis on the environmental and psychological dimensions. The model acknowledges that a mechanistic and linear conceptualization of pain is inherently limited and as such emphasizes the interaction and synthesis of a multitude of factors that result in the subjective experience of pain. With patients reporting chronic pain, environmental and psychological factors play a prominent role in the person's subjective experience, and the BAP-2 sets out to assess these various dimensions. The scale also was designed as an extension of the clinical interview so as to increase its utility.

TECHNICAL. Information describing the development and norming of the original version of the BAP is described in a scholarly article (Tearnan & Lewandowski, 1992), which appears to be the only published article about the measure. Data from Lewandowski's dissertation in 1990 formed the basis of the factor analysis that led to the BAP-2. There appears to be no manual accompanying the instrument. However, an introductory document describes its various subscales, along with the rationale for its use.

Of the sample of 633 participants on which the BAP was validated, 414 (65.4%) indicated low back pain and 219 (34.6) indicated pain at other body sites, such as leg, head, neck, and so forth. As reported by Tearnan and Lewandowski (1992), the BAP scales demonstrated discriminant and concurrent validity with the West Haven-Yale Multidimensional Pain Inventory and the Sickness Impact Profile among the sample.

In Phase 1 of the validation study, most of the subscale alpha coefficients were above .75, although three subscales demonstrated modest reliabilities, namely Reinforcement of Pain (.62); Personal Care (.55) and Blaming Doctors (.65). Correlations between subscales ranged from .00 to .67, with the majority of coefficients being low to moderate. Of the 34 subscales, eight demonstrated intercorrelations above .50.

In the confirmatory factor analysis conducted in Phase 2 of the validation study, the goodness of fit between the a priori model and the observed data was acceptable. Most subscales had alpha reliability coefficients above .80 in the second phase of the study, although three had values below .50, namely, Personal Care for past activities (.44); Personal Hygiene for past activities (.43), and Personal Care for current activities (.37). In a test-retest reliability study with a 10- to 14-day readministration period, the coefficients for the majority of scales exceeded .80.

COMMENTARY. The BAP-2 is a lesser known and much less vigorously researched measure of chronic pain than instruments such as the McGill Pain Questionnaire or the West Haven Pain Inventory; its strength lies in its multidimensionality and comprehensiveness. It assesses pain behavior and pain description in considerable detail and thus provides the clinician with finely detailed information about the patient. The Post BAP-2 also offers the opportunity for clinicians to assess the outcome of treatment and its effects on the patient's experience of pain. This instrument is intended for use among a very specific group, namely, persons reporting chronic pain. To this extent its use is likely to be limited to medically related contexts. It does not seem to address other dimensions of the psychological experience of pain such as malingering and phantom pain, although consistency of item endorsement may be inferred by examining the validity scales.

SUMMARY. The BAP-2 is a thoughtfully designed measure of pain behavior and description, intended for use among medical patients for whom chronic pain is a presenting problem. The development of the scale, including exploratory and factor analysis, is reported in detail. Its strength lies in its ease of administration and parsimonious scoring procedures, as well as in the validity scales that assess response style.


Tearnan, B. H., & Lewandowski, M. J. (1992). The behavioral assessment of pain questionnaire: The development and validation of a comprehensive self-report instrument. American Journal of Pain Management, 2, 181–191.

Children's Communication Checklist-2: United States Edition
Purpose: "Designed to assess children's communication skills in the areas of pragmatics, syntax, morphology, semantics, and speech."
Population: Ages 4-16.
Publication Date: 2006.
Acronym: CCC-2.
Scores, 12: 10 communication domains: Speech, Syntax, Semantics, Coherence, Initiation, Scripted Language, Context, Nonverbal Communication, Social Relations, Interests; plus General Communication Composite, Social Interaction Difference Index.
Administration: Individual.
Price Data, 2008: $165 per complete kit including manual (2006, 112 pages), 25 caregiver response forms, 25 scoring worksheets, and scoring CD; $89 per manual; $39 per 25 caregiver response forms; $22 per 25 scoring worksheets.
Time: (5-10) minutes.
Comments: Scaled scores, general communication composite, social interaction difference index, confidence intervals, and percentile ranks are provided for each scale score.
Authors: D. V. M. Bishop.
Publisher: Pearson.

Review of the Children's Communication Checklist-2: United States Edition by REBECCA McCAULEY, Professor of Speech & Hearing Science, The Ohio State University, Columbus, OH:

DESCRIPTION. The Children's Communication Checklist-Second Edition, United States Edition (CCC-2) is intended for use with children between the ages of 4 years and 16 years, 11 months who have normal hearing, speak in sentences, and use English as their primary language. As a caregiver-completed checklist, the CCC-2 provides a source of pre-assessment information for a child scheduled to be evaluated by a speech-language pathologist (SLP), educational diagnostician, or psychologist for speech and language concerns. In particular, the CCC-2 includes content addressing pragmatics-an aspect of language that is infrequently addressed in most language measures, but is critical for the diagnosis of autism spectrum disorders (ASD). The purposes of the CCC-2 are the identification of pragmatic language impairment, screening of receptive and expressive language skills, and assistance in screening for ASD.

In addition to general knowledge of standardized test use, test users should become familiar with the CCC-2 and practice its administration, scoring, and interpretation prior to clinical use. If the child's caregiver is considered unable to provide valid responses, an alternative individual (e.g., a teacher) who has at least 3 months of regular contact with the child is enlisted. Alternatively, the checklist can be given as a guided interview. One potentially confusing feature of checklist completion is that the direction of the 4-point scale used by caregivers is reversed for checklist items reflecting difficulties versus those reflecting strengths. Although this design is clearly stated and intentional on the part of the test publishers, test users will want to insure respondents' compliance with this change in scoring from one section to the next. Specifically, the same scale, 0-3, is used for difficulties and strengths, but a 0 for a difficulty indicates an absence of the difficulty, whereas a 0 for a strength indicates an absence fo the strength. The checklist takes about 10 to 15 minutes to complete. Once completed, the CCC-2 can be scored using a scoring CD in about 5 minutes or a scoring worksheet in about 15 minutes. Both methods result in (a) derived scaled scores and percentile ranks for each scale; (b) a General Communication Composite (GCC) normalized standard score, confidence interval and percentile rank, reflecting the child's performance on the first 8 scales; and (c) the Social Interaction Difference Index (SIDI). The SIDI is intended for descriptive use and is the sum of performances on the scales E, H, I, and J (scales assessing language and nonlanguage features associated with autism) minus the sum of performances on scales A-D (scales associated with knowledge of language structures). Guidelines for administration, scoring, and interpretation are clearly stated in the test manual.

DEVELOPMENT. The United States Edition of the CCC-2 is an adaptation of the CCC-2 (Bishop, 2003), which was developed in the United Kingdom. In addition to making changes in spelling and phrasing, the test developers also made minor additions to rating scale descriptors prior to U.S. standardization. The first predecessors to the British version of the CCC-2 (Bishop, 2003) had been developed to identify patterns of language difficulties (especially pragmatic) in children with diagnosed language impairment. These earlier versions first used respondents who were teachers and speech-languages professionals, later shifting to the use of familiar adults as respondents. The British edition of the CCC-2 was modified so that it could be used in diagnosis and has been the focus of ongoing validity studies by a variety of researchers.

The CCC-2, United States Edition, is composed of 70 items within ten 7-item scales (A. Speech, B. Syntax, C. Semantics, D. Coherence, E. Initiation, F. Scripted Language, G. Context, H. Nonverbal Communication, I. Social Relations, and J. Interests). Within each scale, 5 items relate to communication difficulties and 2 to strengths. The checklist is structured so that all communication difficulties are evaluated first, then strengths are evaluated. The first four scales (A-D) address aspects of language frequently affected in specific language impairments. The second four scales (E-H) address aspects affected in pragmatic impairments (which can occur independently or along with other language impairments). The last two scales (I-J) address nonlanguage behaviors often associated with autism spectrum disorder (ASD). Findings on the last two sets of scales (Scales E through J) are used as a basis for recommending further assessment for ASD.



The development of norms and selected technical studies for the current edition were conducted on a sample of 950 U.S. children. Sample sizes by age were 100 children for each year between 4 years and 9 years, 11 months; 100 children for each 2-year period between 10 years and 13 years, 11 months; and 150 children for the period from 14 years to 16 years, 11 months. These samples are well matched to U.S. Census data from 2002 for race/ethnicity, geographic region, and parent education level. Twenty-seven percent of these children were receiving special services (including 7% for gifted and talented/advanced placement). Although the sample as a whole was evenly divided between boys and girls, scaled scores were not reported by gender for each age group. Normative data were presented at 3-month intervals for ages 4 years to 6 years, 11 months; at 6-month intervals for 7 to 7 years, 11 months; and at 12-month intervals for subsequent ages.


Reliability studies examined test-retest reliability and internal consistency. Test-retest reliability was examined for 98 children from the standardization sample in three age groups (4 years to 6 years, 11 months; 7 years to 9 years, 11 months; and 10 years to 16 years, 11 months) composed of at least 30 children each and retested over intervals from 1 to 28 days. Results were good for the youngest age group (r = .86) and excellent for the older two groups (r = .96 and .93, respectively). Coefficient alpha data based on the U.S. standardization data suggest appropriately strong internal consistency for each of the component scales.


Children with specific-language impairment (SLI) (n = 54), autistic/spectrum disorder (ASD) (n = 62) and pragmatic language impairment (n = 46) were studied to provide validity evidence. As was required for the standardization sample, children in these groups were required to have normal hearing, to have English as their primary language and to speak in sentences. Because many children with ASD do not develop this level of spoken language, the children studied for the CCC-2 seem likely to be more representative of individuals with ASD who are relatively high functioning.

The test manual describes three studies examining the construct validity of the CCC-2 and, in particular, its ability to distinguish among clinical samples. First, the test developer compared each clinical group against a matched sample from the standardization data. Multiple t-tests showed significant differences between each clinical group and its matched group on individual scales and on the GCC overall. In a second study, because the SIDI reflects the individual's relative functioning on language structure versus pragmatics, the author predicted differing outcomes across the three groups as well as when each group was compared with a matched control group taken from the standardization sample. Results appeared consistent with predictions. In the third study, the test developer reports the diagnostic accuracy of the GCC in the form of sensitivity and specificity for each group at three criterion levels (1, 1.5, and 2 SD below the mean of the matched samples), as well as the Positive Predictive Power (PPP) and Negative Predictive Power (NPP) for these criterion levels at five different base rates. Although not widely reported for children's language tests, such measures of diagnostic accuracy are increasingly called for and represent a level of detail that can help clinicians assess the likely utility of these measures within their own context (Dollaghan, 2007). Except at the lowest base rate, these data generally suggest that the CCC-2 can prove quite helpful in identifying children who have the clinical conditions for which the test was developed.


Weaknesses in the current evidence base supporting use of the CCC-2 for its intended purposes include a need for interexaminer reliability evidence and a more thorough description of its content relevance and coverage. Nonetheless, evidence concerning test-retest reliability and diagnostic validity is strong, particularly in comparison to competing measures-few of which attempt to address language use (pragmatics) as well as language structure. The minimal time demands this instrument places on respondents and test users is another very attractive feature.


The intended purposes of the CCC-2 are screening for ASD and SLI, as well as the identification of pragmatic impairments in children between 4 and 16 years, 11 months. Given evidence supporting it thus far, clinicians who have access to caregivers or other familiar adults to serve as respondents will probably find this a useful addition to their testing protocol. Although it is not suited to children who are lower functioning (i.e., do not speak in sentences), its efficiency, the clarity of its manual, and its focus on both language structure and use (pragmatics) represent considerable strengths. Because the CCC-2 is the product of a long history of research by the test author and other, independent researchers, it seems likely that additional support related to reliability and further validity evidence may be forthcoming.


Bishop, D. V. M. (2003). Children's Communication Checklist-Second Edition. London: The Psychological Corporation.

Dollaghan, C. (2007). The handbook for evidence-based practice in communication disorders. Baltimore: Paul H. Brookes Publishing Company, Inc.

State-Trait Anger Expression Inventory-2: Child and Adolescent
Purpose: Designed to "assess state and trait anger with anger expression and control."
Population: Ages 9 to 18 years.
Publication Date: 2009.
Acronym: STAXI-2 C/A.
Scores, 9: State Anger, State Anger-Feelings, State Anger-Expression, Trait Anger, Trait Anger-Temperament, Trait Anger-Reaction, Anger Expression-Out, Anger Expression-In, Anger Control.
Administration: Group.
Price Data, 2010: $155 per introductory kit including professional manual (87 pages), 25 rating booklets, and 25 profile forms; $52 per professional manual; $80 per 25 rating booklets; $36 per 25 profile forms.
Time: (10-15) minutes.
Authors: Thomas M. Brunner and Charles D. Spielberger.
Publisher: Psychological Assessment Resources, Inc.

Review of State-Trait Anger Expression Inventory-2: Child and Adolescent by STEPHANIE STEIN, Professor and Chair, Department of Psychology, Central Washington University, Ellensburg, WA:

DESCRIPTION. The State-Trait Anger Expression Inventory-2: Child and Adolescent (STAXI-2 C/A) is a self-report rating scale intended for children ages 9 to 18 years. The purpose of the measure is to "assess both state and trait anger along with anger expression and control" (professional manual, p. 11). The STAXI-2 C/A is adapted from the STAXI-2 (Spielberger, 1999; 15:244).

The test authors recommend the STAXI-2 C/A for the purposes of screening, diagnosis, and/or monitoring of anger in children and adolescents. The 35-item rating scale can be administered in either individual or group settings, with an estimated time frame of 15 minutes or less. All items are phrased in the first person (i.e. "I feel...."). Responses are given on a 3-point scale, either "Not at all/Somewhat/Very much" or "Hardly ever/Sometimes/Often." The questionnaire is a 2-page multiple-sheet carbonless form, which provides the questions on the top answer sheet and scoring on the second sheet.

Raw scores are converted to percentile ranks and T scores for five scales: State Anger (S-Ang, 10 items), Trait Anger (T-Ang, 10 items), Anger Expression-Out (AX-O, 5 items), Anger Expression-In (AX-I, 5 items), and Anger Control (AC, 5 items). In addition, scores are computed for four subscales: State Anger-Feelings (S-Ang/F, 5 items), State Anger-Expression (S-Ang/VP, 5 items), Trait Anger-Temperament (T-Ang/T, 5 items), and Trait Anger-Reaction (T-Ang/R, 5 items). In addition, qualitative descriptors are provided for scores on each scale/subscale (Low, Average, Elevated, or Very High). A two-sided Profile Form can be used to plot the scores as percentile ranks and T scores. The test manual provides clear guidelines for administering the STAXI-2 C/A and procedures for dealing with missing responses, as well as detailed scoring and interpretation guidelines.

DEVELOPMENT. The STAXI-2 C/A represents an adaptation of the STAXI-2, with the purpose of developing a version of the instrument appropriate for use with children and adolescents. Although the STAXI-2 has been used to assess anger in older adolescents, it has been limited in usefulness with children and younger adolescents because of the challenging readability level of some items. In addition to simplifying items, the test authors' goal in developing the STAXI-2 C/A included "revalidating the basic conceptual components that were first identified with the original STAXI" (professional manual, p. 11). Furthermore, the STAXI-2 C/A was developed to address contemporary societal problems related to school violence and youth suicide by specifically examining the internalization of anger by youth.

The constructs underlying the development of the STAXI-2 C/A are largely unchanged from Spielberger's state-trait theory of anger operationalized in the original STAXI (Spielberger, 1988) and later in the STAXI-2. However, the test authors also explicitly identify a number of dissimilarities between the STAXI-2 and the STAXI-2 C/A. Some of the changes were intended to make the STAXI-2 C/A more appropriate for younger individuals, including simplifying instructions and decreasing the number of items from 57 in the STAXI-2 to 35 in the STAXI-2 C/A. Other differences between the instruments include reducing the number of scales from six to five, reducing the number of State Anger subscales from three to two (based on results of factor analyses), and the development and replacement of a few items to better reflect the concept of anger in juveniles. Finally, the test authors eliminated the total score (Anger Expression Index) from the STAXI-2 to recognize the multidimensional nature of anger delineated by the separate scales and subscales of the STAXI-2 C/A.



The weighted normative sample for the STAXI-2 C/A consisted of 838 public school students, ages 9–18, with a mean age of 13.77 years. Fifty-one percent of the normative sample was male. Ethnicity of the normative sample was 60% Caucasian, 18% Hispanic, 15% African American, and 7% Other. No information was provided about geographic location or SES demographics of the normative sample. Given the paucity of data regarding the normative sample, it is difficult to determine how well the sample represents the general population. Scale and subscale T scores (means, standard deviations, and standard errors of measurement) are provided for the normative sample by age/gender subgroups for ages 9–11, 12–14, and 15–18.

In addition, scale and subscale T score means and standard deviations are provided for a small clinical sample consisting of 52 adolescents, ages 11–18 (mean age of 15.21), with "delinquent behavior or chronic anger problems" (professional manual, p. 25) from outpatient and inpatient treatment facilities. Forty percent of the clinical sample was male, and ethnicity representation was 67% Caucasian, 21% African American, 6% Hispanic, and 6% Other. The clinical sample was drawn from six Northeastern, Midwestern, and Southern states.


Internal consistency reliability coefficients of each scale and subscale were provided for the entire unweighted normative sample and were further broken down by age/gender groups. The State Anger (S-Ang) scale demonstrated the highest alpha coefficients, ranging from .83 (age 9–11 females) to .90 (age 12–14 females), with an overall coefficient of .87. Though slightly lower, the Trait Anger (T-Ang) scale demonstrated mostly moderate to high coefficients, ranging from .76 (age 12–14 males) to .83 (age 12–14 females), with an overall coefficient of .80. In contrast, the Anger Expression-Out (AX-0) and Anger Expression-In (AX-I) scales showed the lowest alpha coefficients ranging from .57 (AX-I, age 9–11 males) to .76 (AX-I, age 12–14 males), with overall coefficients of .70 and .71, respectively. Overall, internal consistency coefficients are acceptable for all scales and subscales on the entire normative sample but are unacceptable (<.70) for eight of the age/gender subgroup scale or subscale scores. In the clinical sample as a whole, internal consistency coefficients for all of the scale and subscale scores were in the moderate (.74 on AX-I) to high range (.94 on S-Ang).

Surprisingly, no evidence for temporal reliability (test-retest) was provided, even though one of the major constructs (Trait Anger) is largely based on the assumption of stability over time. Instead, the test authors attempt to minimize the need for these data, claiming that numerous prior test-retest studies on the earlier instruments (STAXI and STAXI-2) "have consistently found that the Trait scales are relatively stable over time" (professional manual, p. 21). Though that may be true for the earlier instruments, the STAXI-2 C/A is normed on a much younger population and one cannot just assume that similar temporal stability in scores will be found in this age group on this particular instrument.


Several types of validity assertions were provided for the STAXI-2 C/A. The test authors claim that this instrument has content validity, largely because prior research has supported the content validity of the STAXI-2 for use with children and adolescents. However, no research of this sort is provided on the STAXI-2 C/A. Most of the validity data provided in the test manual focus on discriminant and convergent validity, relying on data from the clinical sample of 52 adolescents with disruptive behavior problems. Scores on the scales and subscales of the STAXI-2 C/A were correlated with those on the Youth Self-Report form (YSR; Achenbach & Rescorla, 2001). As expected, the T-Ang scales and subscales and AX-O had significant, moderately strong positive correlations with YSR Syndrome scales of Aggressive Behavior and Externalizing and YSR DSM-oriented scales of Oppositional Defiant and Conduct Problems. Furthermore, these same YSR scales had significant negative correlations with the AX-I and AC scales, suggesting that higher levels of internalized anger and anger control are associated with lower levels of externalizing behavior disorders in youth.

For the small clinical sample of adolescents, these findings provide evidence for convergent and discriminant validity for several of the STAXI-2 C/A scales/subscales in comparison to an established measure of aggressive and disruptive behavior in youth. However, this validity claim cannot automatically be extended to the general population of youth. The test authors did, however, compare the clinical sample of the STAXI-2 C/A with a same size matched control sample from the normative sample and found statistically significant differences between the groups for T-Ang and AX-O scales/subscales and, to a lesser degree, the S-Ang scale and S-Ang/F subscale. This suggests that the STAXI-2 C/A has the potential to discriminate between adolescents identified with behavior disorders and those in the general population.

COMMENTARY. The apparent strengths of the STAXI-2 C/A include a solid conceptual state/trait anger framework based on decades of research. The test manual is clear, well organized, and thorough in describing theoretical foundations, development of the instrument, scoring, and interpretation. The test authors have addressed several criticisms of the earlier STAXI-2 in this current instrument through the inclusion of demographic information on ethnicity and information regarding standard error of measurement. The STAXI-2 C/A takes very little time to complete, and the items are simple enough to be read and understood by fourth graders. In addition, the STAXI-2 C/A has acceptable to good internal consistency reliability on most scales, which is impressive, given the brevity of the measure and the relatively few items within each scale/subscale. Finally, the test authors provide data that offer convergent and discriminant evidence of validity of parts of the STAXI-2 C/A, at least when administered to adolescents with documented behavior problems.

On the other hand, the STAXI-2 C/A also has some weaknesses. The normative sample for the instrument is not well-defined, especially with regard to geographical location and SES. Therefore, it is difficult to determine how well the sample represents the general population. In addition, no data exist to support the test-retest reliability of the instrument. This lack of reliability data is especially problematic in the scales/subscales in which temporal stability is a defining feature (i.e., Trait Anger). Furthermore, the evidence for validity is limited to a small, clinical sample of adolescents and cannot be generalized to the overall population of school-age children. This lack of validity evidence is unfortunate, given the stated purpose of screening for anger problems in children and adolescents. We can already conclude that adolescents with externalizing behavior problems are likely to have anger issues, but it would be much more helpful if we knew that the STAXI-2 C/A scores provide a valid measure of screening for anger in nonidentified youth. Finally, one of the most potentially important stated purposes of the STAXI-2 C/A is to identify internalized anger in children and adolescents that could result in violent behavior towards self and others (i.e., school violence). However, the relevant scale for measuring internalized anger (AX-I) does not discriminate between clinical and normal populations, nor does it correlate positively with any measure of adolescent dysfunction.

SUMMARY. The STAXI-2 C/A provides a brief, simple, and easy-to-administer measure of anger in children and adolescents. Though this instrument has potential as a useful screening, diagnostic, and monitoring tool, the test authors do not provide sufficient evidence to support the validity of using scores from the STAXI-2 C/A for these purposes, especially in the general population of school-age youth. Additional validation studies are needed to support the clinical use of this instrument. In addition, studies on the temporal reliability of the STAXI-2 C/A are warranted. In the meantime, clinicians who chose to use the STAXI-2 C/A to assess anger problems in youth should be very cautious in interpreting the results.


Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA school-age forms and profiles. Burlington, VT: University of Vermont, Research Center for Children, Youth, and Families.

Spielberger, C. D. (1988). State-Trait Anger Expression Inventory (STAXI) professional manual. Lutz, FL: Psychological Assessment Resources.

Spielberger, C. D. (1999). State-Trait Anger Expression Inventory-2 (STAXI-2) professional manual. Lutz, FL: Psychological Assessment Resources.