Mplus and the R mirt Package: A Comparison of Model Parameter Estimation for Generalized Partial Credit Model (GPCM)

Arif Budiman Al Fariz

Abstract


This article aims to carry out an empirical demonstration to calibrate data using the generalized partial credit model (GPCM) and compare the results of GPCM analysis on paid software, namely Mplus, and open-source software, namely R Package Mirt. The data used in this study used secondary data in the form of item scores from the future orientation scale or Skala orientasi masa depan (S-OMD) with a total of 326 participants using a Likert scale with 4 response options. The results of this study show that GPCM is fit for OMD scale data. Comparison of analysis results using Mplus and R Package mirt shows the same output, such as discrimination parameters and item difficulty levels. The resulting factor score correlation also has a perfect correlation or one. In conclusion, open-source software is capable of having the same computing performance as paid software, and even has several additional features that are not found in paid software.

Keywords


Generalized partial credit model, mplus, mirt, irt

References


Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38(1), 123–140.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.

Asparouhov, T., & Muthén, B. O. (2020). IRT in Mplus Version 4. Mplus Technical Appendix, 1–16. www.statmodel.com

Baker, F. B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). Taylor & Francis. https://doi.org/10.4324/9780203181287-36

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801

Bock, R. D., & Gibbons, R. D. (2021). Item response theory. Wiley.

Buchbinder, F., Goldszmidt, R., & Parente, R. (2012). Item response theory and construct measurement in emerging markets. In C. L. Wang, D. J. Ketchen, & D. D. Bergh (Eds.), West Meets East: Toward Methodological Exchange (Vol. 7, pp. 73–100). Emerald Group Publishing Ltd. https://doi.org/10.1108/s1479-8387(2012)0000007006

Chalmers, R. P. (2012). Mirt: a multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Chou, Y., & Wang, W. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70(5), 717–731. https://doi.org/10.1177/0013164410379322

Christensen, K. B., Kreiner, S., & Mesbah, M. (2013). Rasch model in health. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), John Wiley & Sons (1st ed.). John Wiley & Sons.

de Ayala, R. J. (2022). The theory and practice of item response theory (T. D. Little (ed.); 2nd ed.). The GuilfordPress.

Debelak, R. (2019). An evaluation of overall goodness-of-fit tests for the rasch model. Frontiers in Psychology, 9(JAN). https://doi.org/10.3389/fpsyg.2018.02710

Finch, W. H., & French, B. F. (2019). Educational and Psychological Measurement. In Educational and Psychological Measurement. Routledge/Taylor & Francis Group. https://doi.org/10.4324/9781315650951

Fu, J. (2020). A preliminary comparison of five software applications to estimate unidimensional item response theory models (Research Memorandum No. RM-20-02). https://www.ets.org/Media/Research/pdf/RM-20-02.pdf

Hayat, B., Putra, M. D. K., & Suryadi, B. (2020). Comparing item parameter estimates and fit statistics of the Rasch model from three different traditions. Jurnal Penelitian Dan Evaluasi Pendidikan, 24(1), 39–50. https://doi.org/10.21831/pep.v24i1.29871

Huggins-Manley, A. C., & Algina, J. (2015). The partial credit model and generalized partial credit model as constrained nominal response models, with applications in Mplus. Structural Equation Modeling, 22(2), 308–318. https://doi.org/10.1080/10705511.2014.937374

Kurnia, A. (2019). Analisis tes kemampuan berpikir Kritis Matematis siswa dengan menggunakan Generalized Partial Credit Model (GPCM): penelitian deskriptif kuantitatif di SMP …. PEDIAMATIKA: Journal of Mathematical Science and Mathematics Education, 01(02), 105–114. http://digilib.uinsgd.ac.id/22038/

Mair, P. (2018). Modern psychometrics with R. Springer International Publishing. https://doi.org/10.1080/00401706.2019.1708675

Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.

Maydeu-Olivares, A. (2015). Evaluating fit in IRT models. In S. P. Reise & D. A. Revicki (Eds.), Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment. Routledge.

Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. Journal of the American Statistical Association, 100(471), 1009–1020. https://doi.org/10.1198/016214504000002069

Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206

Muthén, L. K., & Muthén, B. O. (n.d.). Mplus user’s guide: Statistical analysis with latent variables (8th ed.). Los Angeles, CA: Muthén & Muthén.

OECD. (2024). PISA 2022 Technical Report. OECD Publishing. https://doi.org/10.1787/01820d6d-en

Paek, I., & Cole, K. (2020). Using R for Item Response. Routledge/Taylor & Francis Group.

Petrillo, G., Capone, V., Caso, D., & Keyes, C. L. M. (2015). The mental health continuum–short form (MHC–SF) as a measure of well-being in the Italian context. Social Indicators Research, 121(1), 291–312. https://doi.org/10.1007/s11205-014-0629-3

Putra, M. D. K., & Tresniasari, N. (2015). Pengaruh dukungan sosial dan self-efficacy terhadap orientasi masa depan remaja. TAZKIYA Journal of Psychology, 3(1), 71–82.

Rahayu, W., Hayat, B., & Putra, M. D. K. (2023). Analisis rasch: aplikasi dan interpretasi. UNJ Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainments tests. Danish Institute for Educational Research.

Reckase, M. D. (2009). Multidimensional item response theory. Springer.

Samejima, F. (1990). Redictions of reliability coefficients sand standard errors of measurement using the test information function and its modifications. University of Tennessee.

Samejima, F. (1994). Some critical observations of the test information function as a measure of local accuracy in ability estimation. Psychometrika, 59(3), 307–329. https://doi.org/10.1007/BF02296127

Samritin. (2018). Kalibrasi tes campuran dikotomus 2PLM dan politomus grm menggunakan prosedur GRM dan GPCM. JEC (Jurnal Edukasi Cendikia), 2(2), 55–66.

Schauberger, G., & Mair, P. (2020). A regularization approach for the detection of differential item functioning in generalized partial credit models. Behavior Research Methods, 52(1), 279–294. https://doi.org/10.3758/s13428-019-01224-2

Sims, T. (2017). Comparison of IRTPRO 3 and Mplus 7 for multidimensional item response item parameter and examinee ability estimation [Georgia State University]. https://doi.org/10.57709/10130483

Tay, L., Ali, U. S., Drasgow, F., & Williams, B. (2011). Fitting IRT models to dichotomous and polytomous data: Assessing the relative model-data fit of ideal point and dominance models. Applied Psychological Measurement, 35(4), 280–295. https://doi.org/10.1177/0146621610390674

Thissen, D., Nelson, L., Rosan, K., & McLeod, L. D. (2009). Item response theory for items scored in more than two categories. In D. Thissen & H. Wainer (Eds.), Test Scoring. Lawrence Erlbaum Associates., Inc.

Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577. https://doi.org/10.1007/BF02295596

von Davier, M., & Yamamoto, K. (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological Measurement, 28(6), 389–406. https://doi.org/10.1177/0146621604268734

Wainer, H., & Thissen, D. (2009). True score theory: the traditional method. In H. Wainer & D. Thissen (Eds.), Test Scoring. Lawrence Erlbaum Associates., Inc.

Wallmark, J., Ramsay, J. O., Li, J., & Wiberg, M. (2023). Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model. Journal of Educational and Behavioral Statistics, XX(X), 1–27. https://doi.org/10.3102/10769986231207879

Wang, J. (2018). Technical report: does it matter which IRT software you use? yes.

Wang, J., & Wang, X. (2020). Structural equation modeling: applications using Mplus (D. J. Balding, N. A. C. Cressie, G. Fitzmaurice, & H. Goldstein (eds.); 2nd ed.). John Wiley & Sons. https://doi.org/10.1002/9781119422730

Wind, S. A. (2023). Detecting Rating Scale Malfunctioning With the Partial Credit Model and Generalized Partial Credit Model. In Educational and Psychological Measurement (Vol. 83, Issue 5). https://doi.org/10.1177/00131644221116292

Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational measurement for applied researchers. Springer Nature Singapore.

Yamamoto, K., & Kulick, E. (2000). Scaling methodology and procedures for the mathematics and science scales. In TIMSS 1999 Technical Report (pp. 237–263). International Study Center, Lynch School of Education, Boston College.

Yen, W. M. (1993). Scaling performance assessments: strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x

Zanon, C., Hutz, C. S., Yoo, H. H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexão e Crítica, 29(19). https://doi.org/10.1186/s41155-016-0040-x


Full Text: PDF

DOI: 10.15408/jp3i.v13i2.40344

Refbacks



Copyright (c) 2024 Arif Budiman Al Fariz

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.