ssessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide detailed and clear instructions if test users will administer and score the assessment. If applicable, indicate if there are specific qualifications or training experiences needed to administer and score the assessment. | Ensure that all individuals administering and scoring the assessment receive instructions provided by the assessment developer. If applicable, ensure qualified or trained individuals are available to administer and score. | Logistics and required training time should be considered when making decisions to use a particular assessment. Training of the following individuals might be necessary:
Some assessments require that those administering and/or scoring an assessment have certain qualifications such as a degree, graduate coursework, or specific formal training. Even if an assessment does not have requirements for administration and scoring, consider guidance that encourages standardized administration and scoring for comparable scores. | If requirements for administration and scoring are unaddressed in the assessment documentation, ask the assessment developer for more information. Do not use the assessment if qualified individuals are not available or training of individuals to administer and score the assessment would not be possible. |
If the test developer administers or scores the assessment, describe the process for conducting the assessment and/or the procedure used for generating scores. | Ensure that the basis for administering items and/or generating scores aligns with definitions for SEL competencies and supports local plans for interpretation and use. | Some test developers will use automated means for administering or scoring assessments that often involve algorithms. Algorithms for scoring assessments or selecting items can be very technical, but developers should be able to explain conceptually how the algorithm works. This conceptual explanation will help indicate whether the assessment's administration and scoring procedures are appropriate for the local setting and SEL program. | If there is insufficient information about how the assessment is administered and scored, ask the developer for more information. If administration and scoring procedures are not appropriate for the local setting, student population, or SEL program, find another assessment. |
Indicate if specific technological devices and software to administer and/or score the assessment are required or recommended. | Ensure that the all settings (e.g. schools) administering the assessments have access to required or recommended technological devices and software. | If administering an assessment via a technological device, there likely are requirements for the devices and type of software available on those devices. Differences in mode (e.g. paper and pencil vs. computer-delivered), device (e.g. desktop computer vs. tablet), or operating system (e.g. Windows vs. Macintosh) could differentially affect how assessments are completed by respondents and compromise score comparability. | If the required devices or software are not available, find another assessment. If not all settings administering the assessment have access to recommended technological devices and software,
|
If assessment scores are determined using norms,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Report norms should be:
| Ensure the norm study and sample is:
| Norm samples should include and document:
For example, norms developed using a predominately students from urban high school would not be relevant for rural middle school students. | If the norm sample is not current, is not of sufficient size, or does not represent students from different demographic groups relevant to the local population,
|
If there are multiple forms (different versions) for an assessment (e.g. Forms A & B),…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide evidence of score consistency across the different forms. | Determine if the evidence supports that scores from different forms of the assessment are comparable. | Equating is a commonly used technical process that establishes scores are interchangeable across different versions of a test. Equating samples need to be large and representative of the population under consideration for assessment. | Only use one form of the assessment if there is insufficient evidence that scores from multiple forms would provide consistent results across students. |
If the assessment is a completed by a student,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Indicate how development or administration of the SEL assessment addresses common issues such as memory bias, social desirability bias, or reference bias. | Determine if the developer has provided convincing evidence or rationale that the SEL assessment is not susceptible to these biases. | Memory, social desirability, and reference biases are common issues to address in the development or administration of assessments where the student is the respondent.
| If there is insufficient evidence or rationale for how potential biases were addressed or mitigated in development or administration,
|
If the assessment is a rating or observation scale completed by someone other than the student,…
Assessment developer should… | Test user should… | Explanation | What to do if an assessment does not meet this criterion? |
---|---|---|---|
Provide evidence that the administration and scoring protocol will lead to consistent decisions across different raters/observers (interrater reliability) and avoid or mitigate potential biased ratings | Use recommended training and protocols to avoid or mitigate biases. Determine if interrater reliability is acceptable (Kappa or Intraclass Correlation Coefficient (ICC) statistic of .70 or higher). | These types of assessment should provide evidence of interrater reliability because some teachers might rate differently than other teachers across items/tasks or students. Common rating issues include:
Such disparities would affect the consistency across raters. Therefore, these types of assessments should provide instructions on how to help raters/observers overcome these response biases.
| If there is insufficient information about how to avoid or mitigate rater response bias,
If there is insufficient evidence of interrater reliability or interrater reliability is considerably below .70,
|