Ensuring Parameter Estimation Accuracy in 3PL IRT Modeling: The Role of Test Length and Sample Size

Hasan Djidu, Heri Retnawati, Haryanto Haryanto

Abstract


The objective of this simulation study was to evaluate the accuracy of item parameters estimation when employing the 3PL IRT model, mainly focusing on sample size and the length of the test (number of test items). The investigation used six datasets produced by WinGen, each comprising 5000 responses and varying test lengths within 10 to 40 items. For each dataset, the study conducted simulations and re-analyzed the data 15 times, generating a total of 2025 data subsets and estimating 225 parameters for each item. The results revealed that smaller sample sizes led to more pronounced biases, emphasizing a recommended minimum sample size of 3000 for precise parameter estimation. Additionally, the study found that a limited number of items (short test) yielded biased estimations and proposed a minimum of 25 or 40 test items for accurate estimation using the 3PL IRT model. These findings offer valuable insights for test developers in making informed decisions regarding sample sizes and test length, ultimately ensuring reliable and accurate parameter estimates.

Keywords


parameter estimation; item response theory; 3-parameter logistics; test length; sample size

References


Ackerman, P. L., & Kanfer, R. (2009). Test Length and Cognitive Fatigue: An Empirical Examination of Effects on Performance and Test-Taker Reactions. Journal of Experimental Psychology: Applied, 15(2), 163–181. https://doi.org/10.1037/a0015719

Baker, F. B. (1985). Book Review : Item Response Theory: Principles and Applications. In Applied Psychological Measurement (Vol. 9, Issue 3). Nijhof’f Publishing. https://doi.org/10.1177/014662168500900315

Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Divgi, D. R. (1984). Does small N justify use of the Rasch model. Annual Meeting of the American Educational Research Association, New Orleans.

Djidu, H., Ismail, R., Sumin, Rachmaningtyas, N. A., Imawan, O. R., Suharyono, Aviory, K., Prihono, E. W., Kurniawan, D. D., Syahbrudin, J., Nurdin, Marinding, Y., Firmansyah, Hadi, S., & Retnawati, H. (2022). Analisis Instrumen Penelitian dengan Pendekatan Teori Tes Klasik dan Modern Menggunakan Program R.

Fernández-Ballesteros, R. (2012). Multidimensional Item Response Theory. In Encyclopedia of Psychological Assessment. https://doi.org/10.4135/9780857025753.n128

Feuerstahler, L. M. (2022). Metric Stability in Item Response Models. Multivariate Behavioral Research, 57(1), 94–111. https://doi.org/10.1080/00273171.2020.1809980

Han, K. T. (2007a). WinGen. In Computer software]. Amherst, MA: University of Massachusetts at Amherst.

Han, K. T. (2007b). WinGen: Windows software that generates item response theory parameters and item responses. Applied Psychological Measurement, 31(5), 457–459. https://doi.org/10.1177/0146621607299271

Paek, I., Liang, X., & Lin, Z. (2021). Regarding Item Parameter Invariance for the Rasch and the 2-Parameter Logistic Models: An Investigation under Finite Non-Representative Sample Calibrations. Measurement, 19(1), 39–54. https://doi.org/10.1080/15366367.2020.1754703

Retnawati, H. (2014). Teori Respon Butir dan Penerapannya Untuk Peneliti, Praktis Pengukuran dan Penguji. Parama Publishing. http://staff.uny.ac.id/sites/default/files/pendidikan/heri-retnawati-dr/teori-respons-butir-dan-penerapanya-135hal.pdf

Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Kuram ve Uygulamada Egitim Bilimleri, 17(1), 321–335. https://doi.org/10.12738/estp.2017.1.0270

Stenbeck, M., Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1992). Fundamentals of Item Response Theory. In Contemporary Sociology (Vol. 21, Issue 2). SAGE Publications. https://doi.org/10.2307/2075521

Wainer, H., & Wright, B. D. (1980). Robust estimation of ability in the Rasch model. Psychometrika, 45(3), 373–391. https://doi.org/10.1007/BF02293910

Wells, C. S., Subkoviak, M. J., & Serlin, R. C. (2002). The effect of item parameter drift on examinee ability estimates. Applied Psychological Measurement, 26(1), 77–87. https://doi.org/10.1177/0146621602261005

Wickman, H., François, R., Henry, L., & Muller, K. (2021). dlpyr: A Grammar of Data Manipulation. In CRAN Repository (Vol. 3, pp. 1–2). https://cran.r-project.org/package=dplyr

Wise, S. L. (1991). The Utility of a Modified One-Parameter IRT Model With Small Samples. Applied Measurement in Education, 4(2), 143–157. https://doi.org/10.1207/s15324818ame0402_4

Yen, W. M. (1981). Using Simulation Results to Choose a Latent Trait Model. Applied Psychological Measurement, 5(2), 245–262. https://doi.org/10.1177/014662168100500212


Full Text: PDF

DOI: 10.15408/jp3i.v12i2.34130

Refbacks



Copyright (c) 2023 Hasan Djidu, Heri Retnawati, Haryanto Haryanto

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.