U.S. Department of Education: Promoting Educational Excellence for all Americans - Link to ED.gov Home Page
OSEP Ideas tha Work-U.S. Office of Special Education Programs
Ideas that work logo
  Home  Contact us
Models for Large-Scale Assessment
Instructional Practices
 Information About PDF

 Printer Friendly Version (pdf, 112K)

Standards and Assessment Approaches for Students With Disabilities Using a Validity Argument


We introduce accommodations to remove construct-irrelevant variance by making changes in the supports (and not in making changes in the content domains). For example, the mathematics problem could be read aloud to students who cannot read well to eliminate reading as a construct-irrelevant variable. Likewise, we could use a calculator to remove the computational requirements for mathematics problems targeting other constructs. We also could allow more time so the student can finish the item (or test). Tindal and Ketterlin-Geller (2004, p. 8) note the following in their review of mathematics accommodations research on four major classes of accommodations (using calculators, reading mathematics problems to students, employing extended time, and using multiple accommodation packages). Notice, however, that these task (test) features may be problem- and person-specific.

In general, the findings from using calculators and reading mathematics problems to students clearly document the effect of accommodations to be dependent on the type of items and populations. For some items, calculators are facilitative (e.g., solving fractions problems) and for others detractive (e.g., on complex calculations as part of mathematical reasoning). Similarly, item specific findings are beginning to appear in reading mathematics problems: when the problems are wordy (both in count and difficulty) and contain several verb phrases, the accommodations appear effective. Likewise, student characteristic is an important variable. The positive effects of the read-aloud accommodation are more likely with younger students or those with lower reading skills. Finally, the use of extended time appears relatively inert though often it appears as part of other accommodations. For example, calculators and reading mathematics problems often take more time.

Thus, the research on accommodations reflects that changes in the way tests are given or taken (the supports used) indeed can make a difference, sometimes removing construct-irrelevant variance. Furthermore, the effect of an accommodation is dependent on characteristics of the population using the accommodation. At other times, however, accommodations may actually introduce construct-irrelevant variance (e.g. teachers systematically provide extra prompts). So, accommodations cannot be considered a panacea or a simple process. Their usefulness depends on the construct of the standard, the assessment approach or format, and the needs of the student.

At this point in time, most states have both participation and accommodation policies. These policies, however, focus mostly on who needs to participate and how they should participate, and less on why certain types of participation options should be recommended or applied. This statement is particularly true for the use of accommodations. Very few states have policies that explain the reasoning behind an accommodation in terms of the intended construct to be measured and the evidence needed to support its measurement (see Thurlow & Bolt, 2001). We address that kind of evidence through the consequences of assessment, most of which are seriously underreported (c.f., National Center on Educational Outcomes Online Accommodations Bibliography). In the end, states need to have policies on what accommodations to allow and why; these policies need to provide IEP teams guidance in determining how the unique needs of students with disabilities require changes in testing.

Table 3

Types of Accommodations

Presentation   Presentation Equipment   Response   Setting   Scheduling  
Large print Magnification equipment Proctor/scribe Individual Extended time
Braille Light/acoustics Computer or machine Small group With breaks
Read-aloud Calculator Write in test booklets Carrel Multiple sessions
Interpreter for instructions Amplification equipment Tape recorder Separate room Time beneficial to student
graph paper
Communication device Seat location/
Over multiple days
Directions Audio/video cassette Spell checker/
Minimize distractions/
quiet/reduced noise
Flexible schedule
Visual cues on test/instructions Noise buffer Braille Student’s home Other
Administration by other Adaptive or special furniture Pointing Special ed. class  
Additional examples Abacus Other Other  
Other Other      


Alternate Assessments

The general education large-scale assessment (with or without accommodations, or when it involves multiple administrations) is intended to allow educators to make comparable inferences about proficiency on state standards. Yet, at some point, changes are made that are significant enough to constrain the inference, which is when states need to consider them as part of their alternate assessments. In this type of assessment, constraints begin to appear in the inference about proficiency on standards. Because of changes in supports (assistive technologies, prompts or scaffolds) and/or changes in the breadth, depth, and complexity of the material being tested, the scores on alternate assessments based on alternate achievement standards cannot be aggregated with the scores on regular assessments (and therefore must be reported separately). However, as explained later in this paper, using a validity argument within the context of federal regulations allows for the aggregation of proficiency levels based on grade-level, modified, and alternate achievement standards for purposes of reporting Adequate Yearly Progress.

In the sample mathematics problem presented at the beginning of the paper, changes could be made in the assessment approach by observing the student actually making change and using a checklist or rating scale to note the correctness of the response, by assembling into a portfolio materials that document the student making change during an interaction at a local store in the community, or by observing or recording a performance task given to the student in which the student is required to add these amounts of money using real bills and make change accordingly. All of these options could become part of an assessment judged against modified achievement standards or an alternate assessment judged against alternate achievement standards. Remember, however, that these "situated" environments may well introduce other sources of irrelevant variance unrelated to the construct. Therefore, each of these approaches brings with it the need to collect specific kinds of evidence to ensure that the construct is being fully assessed (and not underrepresented), requiring both procedural and empirical evidence.

Validity Argument Using Different Alternate Assessment Approaches

We integrate the validity process, assessment approaches, and populations of students with disabilities by considering two states with considerably different grade-level standards and alternate assessments. For this illustration, we focus on mathematics content standards for grades three to five. Although the selection of states and grade levels was somewhat arbitrary, some related research has been published previously that aids in making this illustration (see Weiner, 2002 for a description of Massachusetts and Tindal et al., 2003 for a description of Oregon).

Each of the assessment strategies used in an alternate assessment (whether judged against grade-level, modified, or alternate achievement standards) needs to be analyzed using the same validity claim: The test reflects the domain of knowledge and skills for the construct and the tasks that have been sampled. The procedural evidence focuses on test development, the quality of the items and tasks, the assemblage of "items" into the total test, and the administration and scoring process. Empirical evidence documents content coverage (alignment between the content standards and the assessment), the stability and consistency in sampling behavior over replications, "item" or task functioning, reliability of judgments and scoring, internal relations among items and tasks, and response processes, as well as external relations with other measures. Just as for establishing the validity of the general education test (with or without accommodations), attention needs to be given to construct-irrelevant variance and construct underrepresentation in alternate assessments; this latter problem is particularly critical as changes are being made in depth, breadth and/or complexity of the standards.

 Previous  |  Next