
Reliability Issues And Evidence
Parallel Forms
A formal concept of error is developed largely around assumptions pertaining to parallel forms. To estimate error scores, it is not advisable to administer the same assessment repeatedly to the same examinee. It is more effective to use parallel forms of the assessment. Parallel forms are assessments comprising different tasks, but the tasks are designed so that they can be assumed to be randomly sampled from the same domain of comparable difficulty. The correlation r_{x1x2} of scores from any two parallel forms,
r_{x1} and _{x2},
are highly correlatedonly if the assessment is highly reliable. The concept of correlated parallel forms lets us continue the definition of psychometric reliability. Equation 2 describes r_{x1x2}
in terms of observed score variances v_{x1} and v_{x2} and their covariance v_{x1x2}.

(2) 
Equation 2 can be written in terms of true score and observed score variance (Feldt & Brennan, 1989; Chatterji, 2003). Equation 3 shows that the observed correlation of two parallel forms provides information for estimating assessment reliability. Substituting Equation 1 in Equation 3, Equation 4 shows that observed score variance is composed of true score and error score variance. As error score diminishes, the ratio of true score and observed score variance approaches a value of 1. So, if the correlation of parallel forms, r_{x1x2} approaches one, then the error variance must be small. Conversely, if r_{x1x2} is small, the error variance must be large.

(3) 

(4) 
While the assumption of parallel forms (items or tests) is generally necessary psychometrically, it is extremely difficult to accomplish. Sources of error are identified below. Use of nonequivalent (nonparallel) forms is identified as one of the most important and difficult to control.
Standard Error of Measurement and Information
Another conceptualization of measurement accuracy is developed in terms of the standard error of measurement (SEM). As described above, the concept of random error around the true score results from administering repeated parallel forms. The SEM of a measure is essentially the average deviation of error scores around the true score. As with reliability, SEM () can be estimated in terms of correlated observations _{x1} and _{x2}. According to Equation 5, as the correlation of parallel forms increases, the standard error of measurement diminishes toward zero.

(5) 
It is important to keep in mind that these measures are estimations. Theoretically, each time the assessment is administered a different measure is likely to be obtained. The degree of difference depends on the reliability or error in measurement.
The preceding SEM estimation is classical, in contrast to an alternative Item Response Theory (IRT) perspective. In IRT, items are calibrated with respect to difficulty and discrimination among many other possible item characteristics. Using the item calibrations, it is possible to estimate the amount of information provided by a test and its items. Furthermore, the amount of information depends on the ability of the examinees. For instance, a difficult item administered to someone with low ability will not generate meaningful, informative results. Response to an easy item provides much more information about someone with low ability. As a rule, items are most informative when responded to by a person with an ability level comparable to the level of the item. (Note: In IRT, item difficulty and person ability are on the same scale, which makes it possible to match items to persons.) A test that is not too easy or too difficult for the respondent is a highly informative test.
Previous  Next
