Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Clearly state the intended interpretation and uses for the assessment score(s) and highlight evidence that justifies using the assessment for those interpretations and uses. | Ensure that the assessment developer's stated interpretations and uses align with local plans for using assessment results and determine if evidence supports those interpretations and uses. | Measures might be developed for screening, formative, interim, and/or summative purposes, and this intent should be specified by the assessment developer and align with local plans for using the data. For example,
| If the assessment developers' intended interpretations and uses for an SEL assessment do not align with local plans or are unsupported, find another assessment that does align with plans for use. |
Identify score(s) provided (e.g. overall score, subscores, performance levels) and items/tasks used to generate each score. Clearly state recommendations and limitations for reporting and interpreting those scores. | Determine if scores provided will guide intended uses or assist in reaching conclusions about students’ achievement of SEL competencies. Ensure that local plans for reporting and interpreting assessment results follow developer's recommendations and limitations. Be alert to possible misinterpretation of scores and take steps to minimize inappropriate interpretation and use. | Do not interpret assessment results for purposes unless recommended by the developer with the support of evidence. Examples include:
Holistic and analytical scoring are typical for many performance assessments.
| If scores provided by the assessment will not guide intended uses or inform conclusions at the local level, find another assessment. Do not attempt to combine or calculate scores from an assessment without proper psychometric evidence. If assessment developer's recommendations and cautions for reporting or interpreting SEL assessment results do not align with local plans for reporting and interpretation, find another assessment that aligns with local plans. |
Cite theory, research, or empirical evidence that students/observers/ interviewers interpret and respond to items/tasks as intended. | Review rationale or evidence provided by the assessment developer that respondents respond as intended to determine if it supports the use of the assessment with the local population and setting. | Assessments should find a way to document that respondents are answering items/tasks using the processes and behaviors the developer intended. For example,
| If there is insufficient rationale or evidence that respondents are interpreting and responding as intended, use other evidence of SEL competencies to confirm interpretations. |
If the assessment will be used to determine students' strengths and needs,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide empirical evidence of consistency of item results (internal reliability) for all assessment scores reported. | Determine if assessment scores have an acceptable reliability coefficient (.80 or above for coefficient alpha). | Consider reliability evidence for each score to be reported understanding that aggregating scores at a class, group, grade, or school level will be more reliable than scores for individual students. If validity evidence appears to support assessment at the individual student level, a measure of internal consistency will indicate the extent to which a respondent responds similarly across items. Internal reliability typically takes the form of a coefficient alpha.
NOTE: Sufficient reliability evidence is not enough to support the use of scores to make consequential decisions about individual students, such as for diagnosis or program placement. | If the internal reliability of any score reported is below .80, even slightly use caution when interpreting and using those scores for decisions about individual students. If the internal reliability of any score is not reported or considerably below .80, do not report, interpret, and/or use any scores/subscores that do not meet this minimum or find an assessment where all scores reported are sufficiently reliable. |
Provide a standard error of measurement and recommended confidence intervals/bands for all reported assessment scores. | When reporting and interpreting scores, include some reference to the true range of those scores based on standard error of measurement and confidence intervals or bands. | If an assessment provides evidence that supports reporting individual scores, also report confidence intervals to capture the true potential range of the students' performance. Confidence intervals are particularly important when comparing two different scores. For example,
| If standard error of measurement and/or confidence intervals or bands are not available,
|
See also expectations for “Is the assessment relevant for the students and the setting?”
If the assessment will be used to compare scores over time,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide empirical evidence that scores are sensitive to changes in SEL over time. | Determine if evidence is applicable to the local setting and program and provides supportive evidence that the assessment will capture changes in SEL that occur over time. | Typically, cross sectional and longitudinal studies provide evidence that the scores of an assessment given at two different points in time would reflect a change in SEL if such a change did occur.
| If sensitivity to change over time is unsupported, do not use the assessment to determine if change over time has occurred. |
If the assessment will be used to evaluate an SEL Program,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide evidence that assessment score(s) demonstrate change after implementing an SEL program that has been shown to be effective at improving the competencies measured by the assessment. | Determine if evidence provides information that is applicable to the local setting and program. | Evidence of how sensitive an assessment is to change could involve a field testing study.
| If there is insufficient evidence that assessment scores can demonstrate change, be cautious about using scores to evaluate the effectiveness of the SEL program and/or instruction. |
If the assessment will be used to improve school/program quality,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide evidence that assessment score(s) are moderately related to desirable educational outcomes (e.g. graduation, absentee rates, etc.) | Determine if evidence provided is applicable to the local quality improvement goals or outcomes. | Longitudinal, quasi-experimental, or experimental research studies can be used to determine if there is a significant correlation between relevant indicators of quality and the assessment score. | If there is insufficient evidence that score(s) are highly related to quality outcomes of local interest, do not use scores to make decisions about improving school/program quality. |
If the assessment will be used to report separate results for different groups of students,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide rationale or evidence that students from different groups conceptualize, define, and experience the SEL competencies assessed by the assessment. | Review rationale or evidence provided to determine applicability to the local setting, SEL program, and demographics of the local student population. | If using the results of an SEL assessment to report separate results for different groups of students, it is important to ensure that relevant groups of student experience the assessed SEL competencies similarly.
If group difference are reported, do so cautiously and only after thorough review. | If there is insufficient rationale or evidence different groups of students conceptualize define, and experience SEL competencies similarly,
|
Provide evidence that assessment score(s) are equally valid, reliable, and fair for different groups of students. If not, clearly caution against the reporting of assessment scores for groups of students separately. | Determine if evidence provided is applicable to the local setting, SEL program, and demographics of the local student population and supports reporting scores separately for different groups of students. | Because of potential issues with relevance of SEL assessments for different groups of students (e.g. cultural, gender, age), if schools have an interest in comparing or reporting separately the results for different groups of students the school should:
| If there is insufficient empirical evidence that score(s) are valid, reliable, and fair for different groups of students or the assessment developer cautions against it, do not report and interpret scores for groups of students separately. |