(For readers who want more technical information on applications of IRT to practical testing problems in general education, see Lord, 1980 Hambleton & Swaminathan, 1985 Andrich, 1988 Suen, 1990 Wainer & Mislevy, 1990 and Hambleton, Swaminathan, & Rogers, 1991.) In language testing, Madsen and Larson (1986) use computers and IRT to study item bias, while de Jong (1986) demonstrates the use of IRT for item selection purposes.
Several other references may prove helpful for readers interested in more information on IRT.
However, Henning (1987) discusses the topic in terms of the steps involved in item banking for language tests and provides recipe-style descriptions of how to calculate the appropriate IRT statistics.
Naturally, a full discussion of IRT is beyond the scope of this article. However, a relatively new branch of test analysis theory, called item response theory (IRT), eliminates the need to have exactly equivalent groups of students when piloting items because IRT analysis yields estimates of item difficulty and item discrimination that are "sample-free." IRT can also provide "item-free" estimates of students' abilities. While the underlying aims of item banking can be accomplished by using traditional item analysis procedures (usually item facility and item discrimination indexes for a detailed description of these traditional item analysis procedures, see Brown, 1996), a problem often occurs because of differences in abilities among the groups of people who are used in piloting the items, especially when they are compared to the population of students with whom the test is ultimately to be used. (For further explanation and examples of item banking in educational testing, see Baker, 1989, pp. Henning (1986) provides a description of how item banking was set up for the ESL Placement Examination at UCLA. With a large item bank available, new forms of tests can be created whenever they are needed. Item banking covers any procedures that are used to create, pilot, analyze, store, manage, and select test items so that multiple test forms can be created from subsets of the total "bank" of items. The discussion in this section will be organized under those four headings. In reviewing the literature on computers in language testing, I have found four recurring sets of issues: (a) item banking, (b) computer-assisted language testing, (c) computer-adaptive language testing, and (d) the effectiveness of computers in language testing. The article will also examine the dominant issue of computer-adaptive testing in the educational measurement literature in an attempt to forecast some of the directions future research on computers in language testing might take.ĬURRENT STATE OF KNOWLEDGE ON COMPUTERS IN LANGUAGE TESTING
The purpose of this article is to examine recent developments in language testing that directly involve computer use including what we have learned in the process. But less is known about the more specific area of computers in language testing. The literature on computer-assisted language learning indicates that language learners have generally positive attitudes toward using computers in the classroom (Reid, 1986 Neu & Scarcella, 1991 Phinney, 1991), and a fairly large literature has developed examining the effectiveness of computer-assisted language learning (for a review, see Dunkel, 1991). The article then examines the educational measurement literature in an attempt to forecast the directions future research on computers in language testing might take and suggests addressing the following issues: (a) piloting practices in computer adaptive language tests (CALTs), (b) standardizing or varying CALT lengths, (c) sampling CALT items, (d) changing the difficulty of CALT items, (e) dealing with CALT item sets, (f) scoring CALTS, (g) dealing with CALT item omissions, (h) making decisions about CALT cut-points, (i) avoiding CALT item exposure, (j) providing CALT item review opportunities, and (k) complying with legal disclosure laws when using CALTs. This article begins by exploring recent developments in the use of computers in language testing in four areas: (a) item banking, (b) computer-assisted language testing, (c) computerized-adaptive language testing, and (d) research on the effectiveness of computers in language testing. PRESENT RESEARCH AND SOME FUTURE DIRECTIONS