Feasibility of computerized adaptive testing evaluated by Monte-Carlo and post-hoc simulations

Štěpánek, Lubomír; Martinková, Patricia

doi:10.15439/2020F197

Artykuł - szczegóły

Tytuł artykułu

Feasibility of computerized adaptive testing evaluated by Monte-Carlo and post-hoc simulations

Autorzy

Štěpánek Lubomír , Martinková Patricia

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2020F197

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (15 ; 06-09.09.2020 ; Sofia, Bulgaria)

Języki publikacji

Abstrakty

Computerized adaptive testing (CAT) is a modern alternative to classical paper and pencil testing. CAT is based on an automated selection of optimal item corresponding to current estimate of test-taker's ability, which is in contrast to fixed predefined items assigned in linear test. Advantages of CAT include lowered test anxiety and shortened test length, increased precision of estimates of test-takers' abilities, and lowered level of item exposure thus better security. Challenges are high technical demands on the whole test work-flow and need of large item banks. In this study, we analyze feasibility and advantages of computerized adaptive testing using a Monte-Carlo simulation and posthoc analysis based on a real linear admission test administrated at a medical college. We compare various settings of the adaptive test in terms of precision of ability estimates and test length. We find out that with adaptive item selection, the test length can be reduced to 40 out of 100 items while keeping the precision of ability estimates within the prescribed range and obtaining ability estimates highly correlated to estimates based on complete linear test (Pearson’s ρ = 0.96). We also demonstrate positive effect of content balancing and item exposure rate control on item composition.

Słowa kluczowe

educational institutions Monte Carlo methods optimisation

instytucje edukacyjne Metody Monte Carlo optymalizacja

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2020

Tom

Vol. 21

Strony

359--367

Opis fizyczny

Bibliogr. 26 poz., rys., wz., wykr.

Twórcy

autor

Štěpánek Lubomír

lubomir.stepanek@lf1.cuni.cz

Institute of Biophysics and Informatics, First Faculty of Medicine, Charles University Salmovská 1, Praha 2

autor

Martinková Patricia

martinkova@cs.cas.cz

Institute of Computer Science of the Czech Academy of Sciences, Pod Vodárenskou vˇeží 2, Praha 8
Faculty of Education, Charles University, Myslíkova 7, Praha 1

Bibliografia

1. Wim J Linden, Wim J van der Linden, and Cees AW Glas. Computerized adaptive testing: Theory and practice. Springer, 2000.
2. Howard Wainer, Neil J Dorans, Ronald Flaugher, et al. Computerized adaptive testing: A primer. Routledge, 2000.
3. David Magis, Duanli Yan, and Alina A Von Davier. Computerized adaptive and multistage testing with R: Using packages catr and mstr. Springer, 2017.
4. David J Weiss and G Gage Kingsbury. “Application of computerized adaptive testing to educational problems”. In: Journal of Educational Measurement 21.4 (1984), pp. 361–375.
5. Jan Stochl, Jan R Böhnke, Kate E Pickett, et al. “Computerized adaptive testing of population psychological distress: simulation-based evaluation of GHQ-30”. In: Social psychiatry and psychiatric epidemiology 51.6 (2016), pp. 895–906.
6. Jan Stochl, Jan R Böhnke, Kate E Pickett, et al. “An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research”. In: BMC medical research methodology 16.1 (2016), p. 58.
7. Dagmar Amtmann, Alyssa M Bamer, Jiseon Kim, et al. “A comparison of computerized adaptive testing and fixed-length short forms for the Prosthetic Limb Users Survey of Mobility (PLUS-MTM)”. In: Prosthetics and orthotics international 42.5 (2018), pp. 476–482.
8. Karon F Cook, Seung W Choi, Paul K Crane, et al. “Letting the CAT out of the bag: comparing computer adaptive tests and an eleven-item short form of the Roland-Morris Disability Questionnaire”. In: Spine 33.12 (2008), p. 1378.
9. Patricia Martinková, Lubomír Štěpánek, Adéla Drabinová, et al. “Semi-real-time analyses of item characteristics for medical school admission tests”. In: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. Ed. by M. Ganzha, L. Maciaszek, and M. Paprzycki. Vol. 11. Annals of Computer Science and Information Systems. IEEE, 2017, pp. 189–194. http://dx.doi.org/10.15439/2017F380.
10. Čestmír Štuka, Patrícia Martinková, Karel Zvára, et al. “The prediction and probability for successful completion in medical study based on tests and preadmission grades”. In: New Educational Review 28 (2012), pp. 138–52.
11. Patrícia Martinková and Adéla Drabinová. “ShinyItemAnalysis for Teaching Psychometrics and to Enforce Routine Analysis of Educational Tests.” In: R Journal 10.2 (2018).
12. Wim J. van der Linden and Cees A.W. Glas. “25 Statistical Aspects of Adaptive Testing”. In: Handbook of Statistics. Elsevier, 2006, pp. 801–838. http://dx.doi.org/10.1016/s0169-7161(06)26025-5.
13. R. Darrell Bock and Murray Aitkin. “Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm”. In: Psychometrika 46.4 (Dec. 1981), pp. 443–459. http://dx.doi.org/10.1007/bf02293801.
14. Yoshio Takane and Jan de Leeuw. “On the relationship between item response theory and factor analysis of discretized variables”. In: Psychometrika 52.3 (Sept. 1987), pp. 393–408. DOI : 10.1007/bf02294363.
15. Cees A. W. Glas. “Modification indices for the 2-PL and the nominal response model”. In: Psychometrika 64.3 (Sept. 1999), pp. 273–294. http://dx.doi.org/10.1007/bf02294296.
16. A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum likelihood from incomplete data via the EM algorithm”. In: Journal of the Royal Statistical Society, Series B 39.1 (1977), pp. 1–38.
17. Hua-Hua Chang and Zhiliang Ying. “Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests”. In: The Annals of Statistics 37.3 (June 2009), pp. 1466–1488. DOI : 10.1214/08-aos614. URL: https://doi.org/10.1214/08-aos614.
18. Daniel O. Segall. “Multidimensional adaptive testing”. In: Psychometrika 61.2 (June 1996), pp. 331–354. DOI :10.1007/bf02294343.
19. Thomas A. Warm. “Weighted likelihood estimation of ability in item response theory”. In: Psychometrika 54.3 (Sept. 1989), pp. 427–450. DOI : 10.1007/bf02294627.
20. Frederic Lord. Applications of item response theory to practical testing problems. Hillsdale, N.J: L. Erlbaum Associates, 1980. ISBN: 978-0898590067.
21. Frank L. Schmidt, John E. Hunter, and Vern W. Urry. “Statistical power in criterion-related validation studies.” In: Journal of Applied Psychology 61.4 (1976), pp. 473–485. DOI : 10.1037/0021-9010.61.4.473.
22. Wim J. van der Linden and Richard M. Luecht. “Observed-score equating as a test assembly problem”. In: Psychometrika 63.4 (Dec. 1998), pp. 401–418. DOI :10.1007/bf02294862.
23. Rebecca D. Hetter and J. Bradford Sympson. “Item exposure control in CAT-ASVAB.” In: Computerized adaptive testing: From inquiry to operation. American Psychological Association, 1997, pp. 141–144. DOI : 10.1037/10244-014.
24. Martha L. Stocking and Charles Lewis. “Controlling Item Exposure Conditional on Ability in Computerized Adaptive Testing”. In: Journal of Educational and Behavioral Statistics 23.1 (1998), p. 57. http://dx.doi.org/10.2307/1165348.
25. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2017. https://www.R-project.org/.
26. R. Philip Chalmers. “Generating Adaptive and Non-Adaptive Test Interfaces for Multidimensional Item Response Theory Applications”. In: Journal of Statistical Software 71.5 (2016), pp. 1–39. DOI : 10.18637/jss.v071.i05.

Uwagi

1. Research was supported by Charles University grant PRIMUS/17/HUM/11.

2. Track 1: Artificial Intelligence

3. Technical Session: 13th International Workshop on Computational Optimization

4. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-2275129a-9701-43d6-bc5b-43d3fc4d3d8d