Zrównywanie wyników testowania. Definicje i przykłady zastosowania

Pokropek, Artur; Kondratek, Bartosz

Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

Artykuł - szczegóły

Czasopismo

Edukacja

2012 | 4(120) | 52-71

Tytuł artykułu

Zrównywanie wyników testowania. Definicje i przykłady zastosowania

Autorzy

Artur Pokropek , Bartosz Kondratek

Warianty tytułu

Test equating. Definitions and examples of applications

Języki publikacji

Abstrakty

Dojrzałe systemy testowania oraz większość nowopowstałych zawierają mechanizmy pozwalające na zrównywanie wyników z różnych sesji testowych w celu kontrolowania różnic w poziomie trudności różnych wersji testu. Artykuł przedstawia definicje zrównywania wyników wraz z przeglądem podstawowych planów zbierania danych stosowanych przy zrównywaniu. W celu ukazania podstawowych trendów w metodologii zrównywania testów na świecie przedstawiono 11 przykładowych systemów testowania, w których przeprowadzanie zrównywania jest wpisane w proces konstrukcji i raportowania wyników testu. Każdy test pokrótce omówiono i wskazano mechanizmy umożliwiające zrównywanie. Przegląd testów podzielono na trzy części w zależności od zastosowań badania testowego: narodowe systemy egzaminacyjne (SAT, ACT, PET, SweSAT), międzynarodowe systemy ewaluacyjne (TIMMS, PIRLS, PISA) oraz narodowe systemy ewaluacyjne (NAEP, EQAO, NAPLAN, NABC).

Long established testing systems as well as most modern testing systems employ mechanisms to allow equating of scores from different testing sessions in order to control for differences in test difficulty. This article introduces a detailed definition of the term test equating together with an overview of main equating designs. In order to illustrate the basic trends in applying the methodology of test equating, 11 testing systems from around the world that use equating are presented. Each test is briefly described with special attention paid to the mechanisms for equating that are employed. The testing systems overview is divided into three sections depending on the test system characteristics, high stakes examination systems (SAT, ACT, PET, SweSAT), international evaluation studies (TIMMS, PIRLS, PISA) and national evaluation studies (NAEP, EQAO, NAPLAN, NABC).

Słowa kluczowe

zrównywanie wyników plany zrównywania badanie umiejętności

test equating equating designs ability assessment

Wydawca

Instytut Badań Edukacyjnych

Czasopismo

Edukacja

Rocznik

2012

Numer

4(120)

Strony

52-71

Opis fizyczny

Daty

wydano

2012-12-31

Twórcy

autor

Artur Pokropek

Instytut Badań Edukacyjnych

autor

Bartosz Kondratek

Instytut Badań Edukacyjnych

Bibliografia

ACT (2007). Technical Manual. Pobrano z: http://www.act.org/aap/pdf/ACT_Technical_Manual.pdf
Allalouf, A. i Ben Shakhar G. (1998). „The effect of coaching on the predictive validity of scholastic aptitude tests”. Journal of Educational Measurement 35(1): 31–47.
Balázsi, I (2006) National Assessment of Basic Competencies in Hungary. Pobrano z: http://www.iaea2006.seab.gov.sg/conference/download/papers/National%20assessment%20of%20basic%20competencies%20in%20Hungary.pdf
Beaton, A. E. i Zwick R. (1992). Overview of the National Assessment of Educational Progress. Journal of Educational Statistics. 17(2), s. 95–109.
Beller, M. (1994). „Psychometric and social issues in admissions to Israeli universities”. Educational Measurement: issues and practice 13 (2): 12–20.
Cook, J. (2009). „An event start: innovative resources to support teachers to better monitor and better support students measured below benchmark”. ACER Research Conference series 3.
Davier von, A. A., (2011). „A statistical perspective on equating test scores”. W: von Davier, A. A. (red.), Statistical models for test equating, scaling, and linking (s. 1–17). New York, ‎NY: Springer-Verlag.‎
Davier von, A. A., Holland, P. W. i Thayer, D. T. (2004). The kernel method of test equating. New York, NY: Springer-Verlag.
Davier von, M. i von Davier, A. A. (2011). „A general model for IRT scale linking and scale transformations”. W: Davier von, A. A. (red.), Statistical models for test equating, scaling, and linking (s. 1–17). New York, ‎NY: Springer-Verlag.‎
Dorans, N. J. i Holland, P. W. (2000). „Population invariance and the equatability of tests: Basic theory and the linear case„. Journal of Educational Measurement, 37(4), 281–306.
EQAO (2011). EQAO’s technical report for the 2009–2010 assessments. Toronto.
Freeman, C. (2009). „First national literacy and numeracy tests introduced”. Research Developments 20(20).
Gruijter, D. N. M. & van der Kamp, L. J. (2005). Statistical test theory for education and psychology.
Holland, P. W., Dorans N. J. i Petersen N. S. (2007). Equating test scores. W: Rao C. R. i Sinharay S. (red.). Handbook of statistics, vol. 26. Psychometrics (s. 169–204). NY: Elsevier.
Kolen, M. J. (1984). „Effectiveness of analytic smoothing in equipercentile equating”. Journal of Educational Statistics, 9, 25–44.
Kolen, M. J. (2007). „Data collection designs and linking procedures”. W: Dorans N. J., Pommerich M., Holland P. W. (red.), Linking and aligning scores and scales. (s. 31–55). New York, NY: Springer-Verlag.
Kolen, M. J., i Brennan R. L. (2004). Test equating, scaling, and linking: Method and practice (2nd ed.). New York, NY: Springer-Verlag.
Lawrence, I., Rigol, G. W., Van Essen, T. i Jackson, C. A. (2002). „A historical perspective on the SAT: 1926–2001”. College Board Research Report No. 2002–7. College Entrance Examination Board, New York.
Linden van der, W. J., (2011). „Local observed-score equating”. W: Davier von, A. A. (red.), Statistical models for test equating, scaling, and linking (s. 201–223). New York, ‎NY: Springer-Verlag.‎
Liu, J. i Walker M. E. (2007). „Score linking issues related to test content changes. W: N. J. Dorans, M. Pommerich i P. W. Holland (red.), Linking and aligning scores and scales (s. 109–134). New York, NY: Springer-Verlag.
Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: ETS.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
NAGB (2010). Writing framework for the 2011 National Assessment of Educational Progress. National Assessment Governing Board, U.S. Department of Education, Washington, DC: U.S. Government Printing Office.
Nellhaus, J., Behuniak, P. i Stancavage, F. B. (2009). Guiding principles and suggested studies for determining when the introduction of a new assessment framework necessitates a break in trend in NAEP. NAEP Validity Studies, American Institutes for Research: Palo Alto, CA.
OECD. (2012). PISA 2009Technical Raport. Paris: OECD Publishing.
Olson, J. F., Martin, M.O. i Mullins, I. V. S. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston: Boston College.
Olson, J. F., Martin, M.O., i Mullins, I. V. S. (2009). PIRLS 2006 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston: Boston College.
Pokropek A. (2010), Zrównywanie wyników egzaminów zewnętrznych w kontekście międzynarodowym, [w:] „Zbiór tekstów: XVII konferencja Polskiego Towarzystwa Diagnostyki Edukacyjnej”, Kraków 2011
Rampey, B. D., Dion, G. S. i Donahue, P. L. (2009). NAEP 2008 Trends in Academic Progress (NCES 2009–479). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education: Washington, D.C.
Rapp, J. (1999). Linear and Equipercentile Methods for Equating PET. NITE. Pobrano z: https://www.nite.org.il/files/reports/e266.pdf
Sandene, B., Horkay, N., Bennett, R., Allen, N., Braswell, J., Kaplan, B. i Oranje, A. (2005). Online assessment in mathematics and writing: reports from the NAEP technology-based assessment project, research and development series (NCES 2005–457). U.S. Department of Education, National Center for Education Statistics. Washington, DC: U.S. Government Printing Office.
Stage, C. (2004). Notes from the Tenth International SweSAT Conference. Umeå, June 1–3, 2004.
Stage, C. i G. Ígren (2002). The Swedish Scholastic Assessment Test (SweSAT). Deptartment of Educational Measurement, Umeå Univ.
Wu, M. (2005) "The Role of Plausible values in Large-Scale Surveys". Studies in Educational Evaluation 31 (2005) 114-128.Yamamoto, K., Mazzeo, J. (1992). „Item Response Theory scale linking in NAEP”. Journal of Educational Statistics, 17(2), s. 155–173.

Uwagi

http://www.edukacja.ibe.edu.pl/images/numery/2012/4-4-pokropek-kondratek-zrownywanie-wynikow-testowania.pdf

Typ dokumentu

Bibliografia

Identyfikatory

ISSN

0239-6858

Identyfikator YADDA

bwmeta1.element.desklight-d31b723c-ca5c-4491-b1a5-4fbc499d6411