Review Samples

Sample reviews from the Mental Measurements Yearbook

Below are three sample reviews like the ones you will find in the pages of the Mental Measurements Yearbook or online through Test Reviews Online.

Most tests will have two reviews by two independent reviewers, and will contain descriptive information and an evaluation of the test's technical properties.

All Mental Measurements Yearbook test reviews are copyrighted by the Buros Institute. Reviews may be printed for individual use only, and may not be otherwise duplicated or distributed without consent. Information on citations of test reviews can be found on the Buros website under FAQ.

Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults.
Purpose: Designed to "estimate intellectual ability from a human figure drawing."
Population: Ages 4-0 to 89-11.
Publication Date: 2004.
Acronym: DAP: IQ.
Score: DAP IQ.
Administration: Individual or group.
Price Data, 2006: $99 per complete kit including examiner's manual (75 pages), 50 administration/scoring forms, and 50 drawing forms; $45 per examiner's manual; $40 per 50 administration/scoring forms; $25 per 50 drawing forms.
Time: (8-15) minutes.
Authors: Cecil R. Reynolds and Julia A. Hickman.
Publisher: PRO-ED.

Review of the Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults by JONATHAN SANDOVAL, Professor of Education, University of the Pacific, Stockton, CA:

DESCRIPTION. The Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults (DAP: IQ) consists of the evaluation of 23 features of a human figure drawing produced in response to the instruction "draw a picture of yourself" (examiner's manual, p. 5). Examinees are asked to draw a full figure from a frontal view. Each of the drawing features (e.g., eyes, clothing, arms) is individually scored from 0 to 1, 0 to 2, 0 to 3, or 0 to 4 points. A maximum score of 49 points is possible. Raw scores may be converted into a single standard score, an IQ with a mean of 100 and standard deviation of 15, a T-score, a z-score, or a stanine. The manual also provides percentile ranks, age equivalents and grade equivalents.

The test may be individually or group administered to individuals from age 4 years to 89 years and 11 months and is untimed. The time to administer and score the test is from 8 to 15 minutes. Administration requires a pencil with eraser and drawing form. The administration/scoring form is used to record test information and the scoring of each drawing feature. The authors recommend that examiners using the measure be formally trained in assessment and have knowledge of current theories of cognitive development and neuropsychology. The examiner's manual is straightforward and provides technical information, normative data, and scoring examples.

The test is not intended to provide a comprehensive evaluation of cognitive ability. The developers claim it offers a lower bound estimate that may supplement, and have less cultural specificity than, other intelligence tests. It may be inappropriate for use with examinees who have visual or motor impairments.

DEVELOPMENT. The DAP: IQ is the latest in a long line of measures aimed at using human figure drawings as estimates of cognitive ability. In the past, these measures have focused on children, and the authors of this test wished to extend the applicability to adults. Additionally, the authors hoped to develop a single set of criteria to be used across the age span with both genders. Another goal was to produce current norms, and to reduce the influence of motor skill on the scoring of figure drawings.

TECHNICAL. Norms for the test were based on 2,295 individuals matched to U.S. Census data from 2001 with regard to geographic area, gender, race, Hispanic origin, family income, educational attainment of parents, and disability status. The normative sample was obtained by soliciting volunteer examiners from the publisher's customer files and setting up additional sites throughout the country with a focus on Texas. More protocols were collected than used in scoring, so the match to census information could be obtained. Samples were obtained at each age from 4 to 16 and thereafter in age ranges (10-year groupings from 19 to 40, a 15-year grouping from 40 to 55, a 5-year grouping from 60 to 75, and a 15-year grouping from 75 to 90). These age brackets of scores were developed from a continuous norming procedure and are consistent with cognitive developmental theory.

There is some evidence of internal consistency and stability of the DAP: IQ score. The coefficient alpha estimates for the age groupings varied from .74 at age 4 to .87 at age 30-39, with a mean and median value of .82. The standard errors of measurement vary between 4 and 5 points. Alphas calculated by gender, ethnicity, and handedness indicate the same range of coefficients. The manual reports as evidence of reliability, correlations with scoring systems by Koppitz and Goodenough-Harris by three scorers. These correlations are .85, .86, and .86. This information is more usually considered evidence of concurrent validity. Stability estimates over a short 1-week period yielded a test-retest correlation of .84 (n = 45). Interscorer reliability was estimated at .95 for protocols selected from across the sample, and at .91 for the more difficult-to-score age group of 6 to 11. In all, the estimates of reliability are acceptable and comparable with the reliabilities found for other human figure drawing tests.

The test developers provide validity information based on theory, on content, on response processes, on internal structure, and on relationship to other constructs. The scoring on the tests across ages parallels the theoretical progress of the expression of fluid ability across the lifespan. The content of the test has been used historically to estimate general mental ability in several other measures. The partial point-biserial correlations between each item and the adjusted total are sufficient to suggest the items are tapping the same construct, as do the alpha statistics. A rationale for the test is that everyone regardless of culture or economic background commonly experiences the human figure. The content is also less influenced by education and the task is simple.

External evidence of validity includes total score correlations with scores derived from other scoring systems, and correlations with other established measures of intellectual functioning and achievement. Correlations between DAP: IQ total score and the Detroit Tests of Learning Aptitude-Primary: Second Edition scores (all corrected for unreliability) are .60 for Nonverbal, .42 for Verbal, and .54 for General Mental Ability. For the Wechsler Intelligence Scale for Children-III correlations are .33 with Verbal, .49 with Performance, and .46 with Full Scale. The DAP: IQ correlates .39 with Woodcock Johnson-Revised (WJ-R) Broad reading, and .36 with Wechsler Individual Achievement Test (WIAT) Reading. Correlations with Math are .46 with WJ-R Broad math and .43 with WIAT Math. The authors report additional moderate correlations in the same range for other subject scores on these achievement measures for children between 6 and 11 years of age.

Both convergent and divergent validity data are reported. The DAP: IQ has moderate correlations (around .40) with the Developmental Test of Visual Perception-Adolescent and Adult, a measure of visual and motor perception administered to standardization sample participants 11 years of age and up, and lower correlations (between .36 and .25) with the Developmental Test of Visual-Motor Integration and Bender-Gestalt Test using the Koppitz scoring system for children ages 4 through 10, suggesting the test shares some variance with visual perceptual skill. The low correlation of .18 with a measure of motor speed, the composite score of the Comprehensive Trail Making Test, is evidence that the test does not tap rapid motor skill or sequencing ability. Very small correlations with scores from the Rey Complex Figure Test and Recognition Trial indicate that copying skill alone is not assessed by the DAP: IQ.

The examiner's manual also presents data on subgroup performance. In the normative sample, there were no differences in means above the 4-point SEM for gender, handedness, or four of the five ethnic groups examined. However, the African American mean was 6 points below average. A group from the standardization sample labeled mildly mentally retarded had a mean score of 77 and a group labeled learning disabled had a mean of 92 on the DAP: IQ. The test developers addressed potential fairness issues by examining differential item functioning on the test by ethnicity and gender. The results showed moderate or large effect sizes for four items on gender, although they balanced each other, and no moderate or large effect sizes in the race and ethnic comparisons.

COMMENTARY. Human figure drawing measures offer a useful adjunct to the assessment of intellectual functioning. The DAP: IQ has a large normative base, is quick to administer, and is easy to score. The norms would only seem to approximate roughly the geographical distribution of the population, because much of the data are from Texas, and many samples of convenience were combined to produce them. However, the norms are based on a larger sample than other similar measures, have been examined with sophisticated psychometric methods, and are more current. Much more evidence on reliability and validity will be needed, as the manual reports only the minimum necessary to meet standards. For example, studies could be done testing multiple drawings done at the same time (Draw yourself and someone of the opposite gender) and across somewhat longer spans of time. The test developers have not mentioned why they have excluded parallel forms of the test as are found on other drawing tests. Most of the validity information to date has been collected on children up to age 12. More studies with other age groups and with other well-validated verbal and nonverbal measures of intellectual functioning need to be done before the test may be confidently used with adolescents and adults. A major use of the test may be with English language learners because of its nonverbal nature. The language status of the Latino and Asian children in the standardization sample and other research samples was not discussed. Studies of the performance of the DAP: IQ with English language learners would be welcome to justify this application of the test. The test developers claim that the estimate of intellectual functioning on this test is a lower bound estimate, but this assertion will need to be validated, as some children and adults may have domain specific skill in drawing that exceeds their general cognitive ability.

SUMMARY. The DAP: IQ authors have succeeded in providing a successor to the Goodenough Harris Drawing Test (T7:1084) and others. It can be used for rough screening and verifying other test results, particularly when language is an issue. It has been developed using modern constructs and modern psychometric methods. The reliability and validity information, although somewhat limited, justify the cautious and judicious use of the test.


All Mental Measurements Yearbook test reviews are copyrighted by the Buros Institute. Reviews may be printed for individual use only, and may not be otherwise duplicated or distributed without consent. Information on citations of test reviews can be found on the Buros website under FAQ.

Location Learning Test.
Purpose: To measure visuo-spatial learning and recall.
Population: Elderly adults.
Publication Date: 2000.
Acronym: LLT.
Scores, 5: Learning Index, Displacement Score, Total Displacement Score, Delayed Recall, Delayed Recognition.
Administration: Individual.
Forms, 2: A, B.
Price Data, 2006: L113.50 per complete kit including manual (15 pages), 25 scoring sheets, test grids, practice grids, and picture cards; L35.50 per 50 scoring sheets.
Time: (25) minutes.
Authors: Romola S. Bucks, Jonathan R. Willison, and Lucie M. T. Byrne.
Publisher: Harcourt Assessment [England].

Review of the Location Learning Test by ANITA M. HUBLEY, Associate Professor of Measurement, Evaluation, and Research Methodology, University of British Columbia, Vancouver, British Columbia, Canada:

DESCRIPTION. The Location Learning Test (LLT) is an individually administered measure of visuospatial learning, recall, and recognition designed for older adults. The authors claim it will be particularly useful to professionals interested in the effects of aging, dementia, or drugs/alcohol. There are two forms of the test. Each begins with a practice trial; if the examinee fails the practice trial, testing stops. Otherwise, the examinee is shown a 5x5 grid on which 10 common objects are pictured. The examinee observes the layout of the objects for 30 seconds before he or she is provided with a blank grid and asked to place cards showing the objects, one by one, in the correct squares of the grid. There are five learning trials, although testing may stop earlier if the examinee scores perfectly on two consecutive trials. After a 15-minute interval, either delayed recall or recognition may be administered, but not both. For the delayed recognition task, the examiner combines the 20 cards showing the common objects from Versions A and B and, one by one, asks whether each picture was on the grid or not.

The administration instructions are clear and easy to follow. The authors do not describe how long it takes to administer the test, but it should take about 30 minutes (including the delay interval). Recording performance and computing displacement scores (i.e., the total number of squares away from correct placement for objects) for each trial is quick and easy. Four key scores are computed. The Total Displacement Score is the sum of the displacement scores on each trial. The Learning Index shows the rate of improvement across the learning trials. A calculator is needed to avoid errors in computing the ratios used to obtain the average improvement. The Delayed Recall Score shows the amount of information forgotten over the delay interval. Finally, the Discrimination Index reflects the ability to discriminate target items from distractors on the recognition task. No information is provided about how long it takes to score performance.

DEVELOPMENT. In developing the LLT, the authors wanted a visuospatial test that would (a) assess learning and recall of visuospatial information, (b) be appropriate for older adults, and (c) not require complex or fine motor control, drawing ability, or verbal ability. The authors trace the origins of the LLT to "some informal work" (manual, p. 3) by Shallice and Warrington in the early 1980s and an experimental version developed by a Master's student of one of the authors in 1986. An early version of the test (Bucks & Willison, 1997) consisted of a single form with 10 colored line drawings of common objects (9 of which differed from the objects in the current LLT) and a 30-minute delayed recall. A recognition trial was administered after the first learning trial. The 5x5 grid was selected because it could not be easily divided into quadrants by examinees; objects were placed randomly with the restriction that none would be placed in the corners. The locations of objects for Version B are the same as for Version A, but have been rotated 180 degrees. Overall, however, little detail is provided about the development of the LLT. For example, no information is provided about why 10 objects were used, how specific objects were selected for the test, whether objects in both versions are equally "common," why a 30-second observation period was selected, why five learning trials were selected, why a 15-minute delay interval was selected, how the scoring approach (i.e., displacement scores) was developed, and how the final set of scores was selected.

TECHNICAL. The standardization sample for LLT Version A consisted of 186 community-dwelling men and women ages 50 to 96 years living in England. The majority of the sample was from the Bristol area (n = 128). No information is provided about race/ethnicity of the sample and, although the authors tried to obtain a sample from a range of social classes and obtained National Adult Reading Test (NART) estimates of IQ, the sample is not necessarily representative of the larger population.

Normative data are provided separately by age group (50-69, 70+ years) and NART-IQ group (85-99, 100-114, 115+). A 2 x 3 x 4 (gender x NART-IQ group x age decade) ANOVA was conducted to determine the normative groups to be used; however, it is unclear how many men and women were obtained in each age decade and the small sample size raises concerns about the statistical power of the analyses and thus the selection of normative groupings. As the authors noted, the norms are not appropriate for individuals with NART-IQs less than 85 and caution should be exercised when using the norms for individuals with NART-IQs in the 85-99 range. Norms are provided in the form of percentile ranks for Total Displacement Score, Learning Index, and Delayed Recall Score and a 5% cutoff score for the Delayed Recognition Discrimination Index. Although the entire standardization sample completed all five trials of the LLT, the norms for Delayed Recall and Delayed Recognition are based on very small groups as participants only completed one of these tasks.

Meaningful estimates of reliability for memory tests are often difficult to obtain due to features such as item interdependence within and between trials and practice or recall effects. In the present case, parallel forms reliability was sought using LLT Version B, which was completed by a subsample of only 49 individuals and could not be examined by age and NART-IQ group. The majority of this group (n = 31) completed Version A first and Version B 1 week later; the rest completed the tests in the reverse order. The two versions correlated .71 for the Total Displacement Score and .49 for the Learning Index. The correlation of the two versions at delay was not reported.

Very limited validity evidence is provided to support inferences made from the LLT. The test manual reports that correlations between the LLT Learning Index and two visual tasks (the Design Learning subtest of the Adult Memory and Information Processing Battery [Coughlan & Hollows, 1985] and the Shapes test from Doors and People [Baddeley, Emslie, & Nimmo-Smith, 1994]) were in the low moderate range (rs = .49 and .44, respectively) but were higher than the correlation (rs = .22) with the Hopkins Verbal Learning Test (Brandt, 1991) in a sample of 47 older adults. Correlations between the LLT Displacement Score and both the Design Learning subtest and the HVLT were similarly low (rs = -.24 and -.29) whereas the correlation with the Shapes test was slightly higher (rs = -.37). This is fairly weak evidence. Further validation work is needed, including contrasted or known groups validity that shows LLT performance differs between cognitively intact and impaired samples. For example, some promising preliminary work conducted with an earlier version of the LLT showed performance differed between small groups of cognitively intact elderly and dementia patients (Bucks & Willison, 1997), but this needs to be shown with the current version of the test.

COMMENTARY. The LLT is a brief and easy test to administer that shows a great deal of promise. Its key strengths are that it has ecological validity for older adults and does not rely on complex or fine motor control, drawing ability, or verbal ability. Scoring is not too difficult but does require a calculator. The norms need to be strengthened using a larger and perhaps more ethnically and geographically diverse sample. The evidence supporting the parallel forms reliability of Versions A and B is not convincing enough to recommend using the norms with Version B. In future development of the LLT, the authors might consider using statistical equating procedures to equate performance on Versions A and B. Most critically, however, validity evidence is extremely limited. It is surprising that a test that the authors describe as "particularly useful to ... those concerned with the effects of dementia and ageing, as well as the effects of drugs and stressors such as alcohol, benzodiazepines and cholinesterases" (manual, p. 4) is presented without any validity evidence to back up these claims. Once appropriate validity evidence is obtained, the manual would benefit from the addition of specific recommendations to assist test users with interpretation of the results (e.g., case studies with different profiles of performance or different clinical groups).

SUMMARY. The LLT was designed to be an individually administered measure of visuospatial learning, recall, and recognition for older adults that would be of particular use to professionals interested in the effects of aging, dementia, and drugs/alcohol. The test meets its goal of assessing visuospatial learning, recall, and recognition in an ecologically valid manner and stands out from the majority of visuospatial tests in that it does not rely on complex motor control or drawing ability. Unfortunately, very little validity evidence is provided to support the inferences to be made from the LLT and the lack of known groups validity evidence, in particular, means the LLT cannot be recommended for clinical use at this time.

REVIEWER'S REFERENCES

Baddeley, A., Emslie, H., & Nimmo-Smith, I. (1994). Doors and People: A Test of Visual and Verbal Recall and Recognition. Bury St. Edmunds, England: Thames Valley Test Company.

Brandt, J. (1991). The Hopkins Verbal Learning Test: Development of a new memory test with six equivalent forms. The Clinical Neuropsychologist, 5, 125-142.

Bucks, R. S., & Willison, J. R. (1997). Development and validation of the Location Learning Test (LLT): A test of visuo-spatial learning designed for use with older adults and in dementia. The Clinical Neuropsychologist, 11, 273-286.

Coughlan, A. K., & Hollows, S. E. (1985). The Adult Memory and Information Processing Battery (AMIPB). Leeds: AK Coughlan, St. James's University Hospital.


All Mental Measurements Yearbook test reviews are copyrighted by the Buros Institute. Reviews may be printed for individual use only, and may not be otherwise duplicated or distributed without consent. Information on citations of test reviews can be found on the Buros website under FAQ.

SAQ-Adult Probation III.
Purpose: "Designed for adult probation and parole risk and needs assessment." Population: Adult probationers and parolees.
Publication Dates: 1985-1997.
Acronym: SAQ.
Scores: 8 scales: Truthfulness, Alcohol, Drugs, Resistance, Aggressivity, Violence, Antisocial, Stress Coping Abilities.
Administration: Group.
Price Data: Available from publisher.
Time: (30) minutes.
Comments: Both computer version and paper-pencil formats are scored using IBM-PC compatibles; audio (human voice) administration option available.
Author: Risk & Needs Assessment, Inc.
Publisher: Risk & Needs Assessment, Inc.
Cross Reference: For a review by Tony Toneatto, see 12:338.

Review of the SAQ--Adult Probation III by ROBERT SPIES, Associate Director, Buros Institute of Mental Measurements, University of Nebraska--Lincoln, Lincoln, NE, and MARK COOPER, Training Specialist, Center on Children, Families & the Law, University of Nebraska--Lincoln, Lincoln, NE:

DESCRIPTION. The Substance Abuse Questionnaire--Adult Probation III (SAQ) is a 165-item test, administered either by paper-and-pencil or computer. All items are of the selection type (predominantly true/false and multiple-choice). Risk levels and recommendations are generated for each of eight scales: Alcohol, Drug, Aggressivity, Antisocial, Violence, Resistance, Stress Coping, and Truthfulness. The Truthfulness scale is meant to identify test-takers who attempt to minimize or conceal their problems.

Nonclinical staff can administer, score, and interpret the SAQ. Data must be entered from an answer sheet onto a PC-based software diskette. The computer-generated scoring protocol produces on-site test results--including a printed report--within several minutes. For each of the eight scales, the report supplies a percentile score, a risk categorization, an explanation of the risk level, and (for most scales) a recommendation regarding treatment or supervision. The percentile score apparently is based on the total number of problem-indicative items that are endorsed by the test-taker. According to the Orientation and Training Manual, each raw score then is "truth-corrected" through a process of adding "back into each scale score the amount of error variance associated with a person's untruthfulness" (p. 8). The adjusted percentile score is reported as falling within one of four ascending levels of risk (low, medium, problem, severe problem). The responsible staff person is expected to use information from the report, along with professional judgment, to identify the severity of risk and needs and to develop recommendations for intervention.

DEVELOPMENT. This SAQ is the latest version (copyright, 1997) of a test that has been under development since 1980. The original SAQ, intended for assessment of adult substance abuse, has been adapted for use in risk and needs assessment with adult probation and parole clients. Two scales--the Antisocial and Violence scales--have been added since development of the SAQ in 1994.

Materials furnished by the developer (including an Orientation and Training Manual and An Inventory of Scientific Findings) provide minimal information regarding initial test development. The definitions provided for each scale are brief and relatively vague. The constructs underlying several scales appear to overlap (e.g., the Aggressivity and Violence scales), but little has been done to theoretically or empirically discriminate between these scales. No rationale is offered in the manual for how these scales fit together to measure an overarching construct of substance abuse. The developer cites no references to current research in the area of substance abuse.

TECHNICAL. Information describing the norming process is vague. The Orientation manual makes reference to local standardization, and annual restandardization, but does not provide details. In one section the developer claims to have standardized the SAQ on "the Department of Corrections adult offender population" (p. 7). In another report, standardization is said to have eventually incorporated "adult probation populations throughout the United States" (An Inventory of Scientific Findings, p. 5). One might assume, based on the citing of SAQ research studies involving literally thousands of probationers that the recency and relevance of norms is beyond question. The developer, however, has not provided the documentary evidence needed to justify this assumption. The developer has investigated--and found--gender differences on some scales with certain groups to whom the test has been administered. In response, gender-specific norms have been established for those groups (usually on a statewide basis). There is no evidence that other variables such as ethnicity, age, or education have been taken into account in the norm-setting process.

The items selected for use in the test have several commonalities. Most items focus on personal behaviors, perceptions, thoughts, and attitudes and are linked in a direct and very obvious way to the content of associated scales (e.g., "I am concerned about my drinking," from the Alcohol scale). Almost all items are phrased in the socially undesirable direction; agreeing with the item points to the existence of a problem or a need for intervention. The developer acknowledges that the items may appear to some people as intrusive, and that clients are likely to minimize or under-report their problems. In the SAQ, the response to this concern has been the inclusion of the Truthfulness scale and calculation of "truth-corrected" scale scores. Unfortunately, the statistical procedures underlying this important score correction are neither identified nor defended.

Internal consistency for the individual subscales of the SAQ has been well-established by a large number of developer-conducted studies that report Cronbach alpha estimates generally in the .80s to .90s. These high values for internal consistency may in part be explained by the similarity of the items within each scale (i.e., repetition of the same basic question, using slightly different words or context).

Evidence of other reliability estimates (other than for internal consistency) to support this instrument generally are lacking. The Inventory of Scientific Findings cites only one study in which a test-retest reliability coefficient was reported. Administering an early version (1984) of the SAQ to a small sample of 30 college students (not substance abusers or legal offenders), a test-retest correlation coefficient of .71 was found across an interval of one week.

Evidence to support the validity of the SAQ is limited. Some concurrent validity evidence is presented, in the form of multiple studies showing modest correlations between some SAQ scales and subscales of the Minnesota Multiphasic Personality Inventory (MMPI). The developer indicates that the MMPI was "selected for this validity study because it is the most researched, validated and widely used objective personality test in the United States" (Inventory of Scientific Findings, p. 14). This explanation, however, does not suffice as a rationale for use of the MMPI to support concurrent validity; and no theoretical framework is provided about how the SAQ subscales relate to the personality constructs underlying the MMPI.

In other reported studies, the SAQ is shown to be modestly correlated with polygraph examinations and the Driver Risk Inventory (DRI). Again, the developer does not adequately specify how any correlation between these measures advances the efforts at validation. The studies cited, and the validation process in general, do not meet accepted psychometric standards for substantiating validity evidence established in the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999). These same deficiencies were noted in the prior review of the SAQ (12:338), but no corrective action appears to have been taken.

COMMENTARY. The value of the SAQ as a measure of substance abuse severity with criminal justice populations seems to be compromised on a number of levels. First, the test lacks a clear focus. Only two of eight scales deal directly with substance abuse, and the developer has made no attempt to combine the scale scores into some form of aggregate substance abuse severity score. Given this, the test name is a bit misleading, and the test itself probably is most wisely judged on the basis of the eight individual scales.

Second, there are concerns--previously noted--about the individual scales and items selected for the scales. Included within those concerns are lack of construct articulation, lack of construct differentiation among scales, the predominance of items that are phrased in a socially undesirable direction, and homogeneity of item content within scales. Item phrasing and the bluntness of the items (e.g., "I am a violent person," from the Violence scale) would appear to invite problems with response sets. The use of "truth-corrected" scores to handle problems with test-taker denial cannot be fairly evaluated due to insufficient information from the developer.

Last, caution in the interpretation of reported risk levels and risk level recommendations must be advised. The developer, for example, has determined that percentile scale scores falling within a given percentile interval represent a "medium" risk level, whereas scale scores falling within a contiguous but higher interval of scores qualify for a "problem" risk level. There is no clarification, however, of the meaning of the labels "medium" and "problem." Further, there are no statements regarding how the two risk levels are to be discriminated from one another, and no identification of outcomes (or probabilities of outcomes) that are tied to the levels. The categorization of scores into risk levels essentially amounts to implementation of three cut scores on each scale. Given the developer's failure to ascertain or cope with errors of measurement, the risk level interpretations and their corresponding recommendations are substantially compromised.

SUMMARY. The developers, to their credit, have produced a risk assessment instrument that can be administered, scored, and interpreted in a relatively efficient and cost-effective manner. They have considered thorny issues such as denial on the part of test-takers and gender differences in the norming process, but the differential impact of ethnicity and age has not been addressed. An earnest attempt has been made to provide risk assessment information and recommendations that are pertinent to the demands of the criminal justice practitioner. On balance, however, the SAQ falls far short of the mark. Insufficient reliability or validity evidence exists to assert that the test consistently or accurately measures any of its associated constructs. There is continued doubt, in the words of the prior reviewer of the SAQ, that the test "conveys any useful information additional to simply asking the client if they have an alcohol-drug problem, if they are violent, and how they cope with stress" (Toneatto, 1995, p. 891). Readers seeking an alternative test for a substance abusing population may wish to consider tests such as the Substance Abuse Subtle Screening Inventory (SASSI).

REVIEWERS' REFERENCES

Toneatto, T. (1995). [Review of the SAQ--Adult Probation [Substance Abuse Questionnaire].] In J. C. Conoley & J. C. Impara (Eds.), The twelfth mental measurements yearbook (pp. 889-891). Lincoln, NE: Buros Institute of Mental Measurements.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.