U.S. Department of Education: Promoting Educational Excellence for all Americans - Link to ED.gov Home Page
OSEP Ideas tha Work-U.S. Office of Special Education Programs
Ideas that work logo
  Home  Contact us
Models for Large-Scale Assessment
TOOL KIT HOME
OVERVIEW
MODELS FOR LARGE-SCALE ASSESSMENT
TECHNICAL ASSISTANCE PRODUCTS
Assessment
Instructional Practices
Behavior
Accommodations
RESOURCES
 
 
 Information About PDF
 
 

 Printer Friendly Version (pdf, 91K)

Reliability Issues And Evidence

Parallel Forms

A formal concept of error is developed largely around assumptions pertaining to parallel forms. To estimate error scores, it is not advisable to administer the same assessment repeatedly to the same examinee. It is more effective to use parallel forms of the assessment. Parallel forms are assessments comprising different tasks, but the tasks are designed so that they can be assumed to be randomly sampled from the same domain of comparable difficulty. The correlation rx1x2 of scores from any two parallel forms, rx1 and x2, are highly correlatedonly if the assessment is highly reliable. The concept of correlated parallel forms lets us continue the definition of psychometric reliability. Equation 2 describes rx1x2 in terms of observed score variances vx1 and vx2 and their covariance vx1x2.

mathematical equation:  the correlation of scores from two parallel forms equals the covariance of the two observed scored divided by the product of the standard deviations of the two observed scores (2)

Equation 2 can be written in terms of true score and observed score variance (Feldt & Brennan, 1989; Chatterji, 2003). Equation 3 shows that the observed correlation of two parallel forms provides information for estimating assessment reliability. Substituting Equation 1 in Equation 3, Equation 4 shows that observed score variance is composed of true score and error score variance. As error score diminishes, the ratio of true score and observed score variance approaches a value of 1. So, if the correlation of parallel forms, rx1x2 approaches one, then the error variance must be small. Conversely, if rx1x2 is small, the error variance must be large.

mathematical equation:  the correlation of parallel forms equals the true score variance divided by the observed score variance (3)
mathematical equation:  the correlation of parallel forms equals the true score variance divided by the sum of the true score variance and the error score variance (4)

While the assumption of parallel forms (items or tests) is generally necessary psychometrically, it is extremely difficult to accomplish. Sources of error are identified below. Use of nonequivalent (nonparallel) forms is identified as one of the most important and difficult to control.

Standard Error of Measurement and Information

Another conceptualization of measurement accuracy is developed in terms of the standard error of measurement (SEM). As described above, the concept of random error around the true score results from administering repeated parallel forms. The SEM of a measure is essentially the average deviation of error scores around the true score. As with reliability, SEM (mathematical notation:  standard error of measurement) can be estimated in terms of correlated observations x1 and x2. According to Equation 5, as the correlation of parallel forms increases, the standard error of measurement diminishes toward zero.

mathematical equation:  standard error of measurement equals standard deviation times the square root of the correlation of parallel forms times the square root of one minus the correlation of parallel forms (5)

It is important to keep in mind that these measures are estimations. Theoretically, each time the assessment is administered a different measure is likely to be obtained. The degree of difference depends on the reliability or error in measurement.

The preceding SEM estimation is classical, in contrast to an alternative Item Response Theory (IRT) perspective. In IRT, items are calibrated with respect to difficulty and discrimination among many other possible item characteristics. Using the item calibrations, it is possible to estimate the amount of information provided by a test and its items. Furthermore, the amount of information depends on the ability of the examinees. For instance, a difficult item administered to someone with low ability will not generate meaningful, informative results. Response to an easy item provides much more information about someone with low ability. As a rule, items are most informative when responded to by a person with an ability level comparable to the level of the item. (Note: In IRT, item difficulty and person ability are on the same scale, which makes it possible to match items to persons.) A test that is not too easy or too difficult for the respondent is a highly informative test.

 Previous  |  Next