Mplus and the R mirt Package: A Comparison of Model Parameter Estimation for Generalized Partial Credit Model (GPCM)
Abstract
Keywords
References
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38(1), 123–140.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.
Asparouhov, T., & Muthén, B. O. (2020). IRT in Mplus Version 4. Mplus Technical Appendix, 1–16. www.statmodel.com
Baker, F. B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). Taylor & Francis. https://doi.org/10.4324/9780203181287-36
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Bock, R. D., & Gibbons, R. D. (2021). Item response theory. Wiley.
Buchbinder, F., Goldszmidt, R., & Parente, R. (2012). Item response theory and construct measurement in emerging markets. In C. L. Wang, D. J. Ketchen, & D. D. Bergh (Eds.), West Meets East: Toward Methodological Exchange (Vol. 7, pp. 73–100). Emerald Group Publishing Ltd. https://doi.org/10.1108/s1479-8387(2012)0000007006
Chalmers, R. P. (2012). Mirt: a multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Chou, Y., & Wang, W. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70(5), 717–731. https://doi.org/10.1177/0013164410379322
Christensen, K. B., Kreiner, S., & Mesbah, M. (2013). Rasch model in health. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), John Wiley & Sons (1st ed.). John Wiley & Sons.
de Ayala, R. J. (2022). The theory and practice of item response theory (T. D. Little (ed.); 2nd ed.). The GuilfordPress.
Debelak, R. (2019). An evaluation of overall goodness-of-fit tests for the rasch model. Frontiers in Psychology, 9(JAN). https://doi.org/10.3389/fpsyg.2018.02710
Finch, W. H., & French, B. F. (2019). Educational and Psychological Measurement. In Educational and Psychological Measurement. Routledge/Taylor & Francis Group. https://doi.org/10.4324/9781315650951
Fu, J. (2020). A preliminary comparison of five software applications to estimate unidimensional item response theory models (Research Memorandum No. RM-20-02). https://www.ets.org/Media/Research/pdf/RM-20-02.pdf
Hayat, B., Putra, M. D. K., & Suryadi, B. (2020). Comparing item parameter estimates and fit statistics of the Rasch model from three different traditions. Jurnal Penelitian Dan Evaluasi Pendidikan, 24(1), 39–50. https://doi.org/10.21831/pep.v24i1.29871
Huggins-Manley, A. C., & Algina, J. (2015). The partial credit model and generalized partial credit model as constrained nominal response models, with applications in Mplus. Structural Equation Modeling, 22(2), 308–318. https://doi.org/10.1080/10705511.2014.937374
Kurnia, A. (2019). Analisis tes kemampuan berpikir Kritis Matematis siswa dengan menggunakan Generalized Partial Credit Model (GPCM): penelitian deskriptif kuantitatif di SMP …. PEDIAMATIKA: Journal of Mathematical Science and Mathematics Education, 01(02), 105–114. http://digilib.uinsgd.ac.id/22038/
Mair, P. (2018). Modern psychometrics with R. Springer International Publishing. https://doi.org/10.1080/00401706.2019.1708675
Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.
Maydeu-Olivares, A. (2015). Evaluating fit in IRT models. In S. P. Reise & D. A. Revicki (Eds.), Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment. Routledge.
Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. Journal of the American Statistical Association, 100(471), 1009–1020. https://doi.org/10.1198/016214504000002069
Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
Muthén, L. K., & Muthén, B. O. (n.d.). Mplus user’s guide: Statistical analysis with latent variables (8th ed.). Los Angeles, CA: Muthén & Muthén.
OECD. (2024). PISA 2022 Technical Report. OECD Publishing. https://doi.org/10.1787/01820d6d-en
Paek, I., & Cole, K. (2020). Using R for Item Response. Routledge/Taylor & Francis Group.
Petrillo, G., Capone, V., Caso, D., & Keyes, C. L. M. (2015). The mental health continuum–short form (MHC–SF) as a measure of well-being in the Italian context. Social Indicators Research, 121(1), 291–312. https://doi.org/10.1007/s11205-014-0629-3
Putra, M. D. K., & Tresniasari, N. (2015). Pengaruh dukungan sosial dan self-efficacy terhadap orientasi masa depan remaja. TAZKIYA Journal of Psychology, 3(1), 71–82.
Rahayu, W., Hayat, B., & Putra, M. D. K. (2023). Analisis rasch: aplikasi dan interpretasi. UNJ Press.
Rasch, G. (1960). Probabilistic models for some intelligence and attainments tests. Danish Institute for Educational Research.
Reckase, M. D. (2009). Multidimensional item response theory. Springer.
Samejima, F. (1990). Redictions of reliability coefficients sand standard errors of measurement using the test information function and its modifications. University of Tennessee.
Samejima, F. (1994). Some critical observations of the test information function as a measure of local accuracy in ability estimation. Psychometrika, 59(3), 307–329. https://doi.org/10.1007/BF02296127
Samritin. (2018). Kalibrasi tes campuran dikotomus 2PLM dan politomus grm menggunakan prosedur GRM dan GPCM. JEC (Jurnal Edukasi Cendikia), 2(2), 55–66.
Schauberger, G., & Mair, P. (2020). A regularization approach for the detection of differential item functioning in generalized partial credit models. Behavior Research Methods, 52(1), 279–294. https://doi.org/10.3758/s13428-019-01224-2
Sims, T. (2017). Comparison of IRTPRO 3 and Mplus 7 for multidimensional item response item parameter and examinee ability estimation [Georgia State University]. https://doi.org/10.57709/10130483
Tay, L., Ali, U. S., Drasgow, F., & Williams, B. (2011). Fitting IRT models to dichotomous and polytomous data: Assessing the relative model-data fit of ideal point and dominance models. Applied Psychological Measurement, 35(4), 280–295. https://doi.org/10.1177/0146621610390674
Thissen, D., Nelson, L., Rosan, K., & McLeod, L. D. (2009). Item response theory for items scored in more than two categories. In D. Thissen & H. Wainer (Eds.), Test Scoring. Lawrence Erlbaum Associates., Inc.
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577. https://doi.org/10.1007/BF02295596
von Davier, M., & Yamamoto, K. (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological Measurement, 28(6), 389–406. https://doi.org/10.1177/0146621604268734
Wainer, H., & Thissen, D. (2009). True score theory: the traditional method. In H. Wainer & D. Thissen (Eds.), Test Scoring. Lawrence Erlbaum Associates., Inc.
Wallmark, J., Ramsay, J. O., Li, J., & Wiberg, M. (2023). Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model. Journal of Educational and Behavioral Statistics, XX(X), 1–27. https://doi.org/10.3102/10769986231207879
Wang, J. (2018). Technical report: does it matter which IRT software you use? yes.
Wang, J., & Wang, X. (2020). Structural equation modeling: applications using Mplus (D. J. Balding, N. A. C. Cressie, G. Fitzmaurice, & H. Goldstein (eds.); 2nd ed.). John Wiley & Sons. https://doi.org/10.1002/9781119422730
Wind, S. A. (2023). Detecting Rating Scale Malfunctioning With the Partial Credit Model and Generalized Partial Credit Model. In Educational and Psychological Measurement (Vol. 83, Issue 5). https://doi.org/10.1177/00131644221116292
Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational measurement for applied researchers. Springer Nature Singapore.
Yamamoto, K., & Kulick, E. (2000). Scaling methodology and procedures for the mathematics and science scales. In TIMSS 1999 Technical Report (pp. 237–263). International Study Center, Lynch School of Education, Boston College.
Yen, W. M. (1993). Scaling performance assessments: strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
Zanon, C., Hutz, C. S., Yoo, H. H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexão e Crítica, 29(19). https://doi.org/10.1186/s41155-016-0040-x
DOI: 10.15408/jp3i.v13i2.40344
Refbacks
Copyright (c) 2024 Arif Budiman Al Fariz
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.