The Impact of Sample Size, Test Length, and Person-Item Targeting on the Separation Reliability in Rasch Model: A Simulation Study

Rahmat S. Bintang, Suprananto Suprananto

Abstract


This research is a simulation study using resampling methods to see the effect of sample size, test length, and person-item targeting on separation reliability in the Rasch Model. Simulation conditions were created with several predetermined factors, namely sample size with five conditions (200, 500, 1000, 2000, and 4000 person), test length with three conditions (20, 40, and 60 items), and person-item targeting with five conditions (-2, -1, 0, 1, and 2 logit). The total number of conditions is 75 conditions where each condition is replicated 50 times so that a total of 3.750 data are generated. The data is generated using WinGen software. The results of the separation reliability analysis were analyzed using Winsteps software. The separation reliability criteria set are for Person Separation Reliability (PSR) > 0.80 and for Item Separation Reliability (ISR) > 0.90. The results showed that 75 conditions (100%) resulted in ISR estimates that met the criteria (> 0.90). For PSR estimation, 37 conditions (49%) resulted in PSR estimates that met the criteria (> 0.80) and 38 conditions (51%) resulted in PSR estimates that did not meet the criteria (< 0.80). In addition, PSR estimation is influenced by test length and person-item targeting.


Keywords


sample size, test length, person-item targeting, person separation reliability, item separation reliability

References


Andrich, D. (2011). Rating scales and Rasch measurement. Expert Review Pharmacoeconomics Outcomes Research, 11(5), 571-585.

Andrich, D., & Marais, I. (2019). A course in Rasch measurement theory: Measurement in the educational, social and health science. Springer Nature Singapore.

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (1999). Standards for Educational and Psychological Testing. American Educational Research Association.

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 3–25. https://doi.org/10.1037/amp0000191

Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale. Unpublished doctoral dissertation. University of Massachusetts Amherst.

Bonett, D. G. (2002). Sample size requirements for testing and estimating coefficient alpha. Journal of Educational and Behavioral Statistics, 27(4), 335–340. https://doi.org/10.3102/10769986027004335

Casey, T. M., & Harden, J. J. (2014). Monte carlo simulation and resampling methods for social science. SAGE Publications, Inc.

Chen, W-H., Lenderking, W., Jin, Y., Wyrwich, K. W., Gelhorn, H., Revicki, D. A. (2013). Is Rasch model analysis applicable in small sample size pilot studies for assesing item characteristics? an example using promis pain behavior item bank data. Quality Life Research. https://doi.org/10.1007/s11136-013-0487-5

de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates, Inc.

Feinberg, R. A., & Rubright, D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49.

Guyon, H., Kop, J-L., Juhel, J., & Falissard, B. (2018). Measurement, ontology, and epistemology: Psychology needs pragmatism-realism. Theory & Psychology, 28(2), 149–171.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. SAGE Publication, Inc.

Hayat, B. (1995). Pengantar model Rasch. Kemendikbud Publishing.

Hayat, B., Putra, M. D. K., Suryadi, B. (2020). Comparing item parameter estimates and fit statistics of the Rasch model from three different traditions. Jurnal Penelitian dan Evaluasi Pendidikan, 24(1), 39-50. https://doi.org/10.21831/pep.v24i1.29871

Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement,1(2), 152–176.

Kopalle, P. K., & Lehmann, D. R. (1997). Alpha inflation? The impact of eliminating scale items on Cronbach's alpha. Organizational Behavior and Human Decision Processes, 70(3), 189–197. https://doi.org/10.1006/obhd.1997.2702

Linacre, J. M. (1994). https://www.rasch.org/rmt/rmt74m.htm

Linacre, J. M. (2018). Winsteps® Rasch measurement computer program user’s guide. Winsteps.com.

O’Neill, T. R., Gregg, J. L., & Peabody, M. R. (2020). Effect of sample size on common item equating using the dichotomous Rasch model. Applied Measurement in Education, 33(1), 1-23. https://doi.org/10.1080/08957347.2019.1674309

Putra, M. D. K., Umar, J., Hayat, B., & Utomo, A. P. (2017). Pengaruh ukuran sampel dan intraclass correlation coefficients (ICC) terhadap bias estimasi parameter multilevel latent variable modeling: Studi dengan simulasi monte carlo. Jurnal Penelitian dan Evaluasi Pendidikan, 21(1), 34-50.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.

Raykov, T., & Marcoulides, G. A. (2019). Thanks coefficient alpha, we still need you! Educational and Psychological Measurement, 79(1), 200–210. https://doi.org/10.1177/0013164417725127

Setiadi, H. (1997). Small sample IRT item parameter estimates. (Unpublished doctoral dissertation). University of Massachusetts Amherst.

Suryadi, B., Hayat, B., & Putra, M. D. K. (2020). Evaluating psychometric properties of the Muslim Daily Religiosity Assessment Scale (MUDRAS) in Indonesia samples using the Rasch model. Mental Health, Religion & Culture. https://doi.org/10.1080/13674676.2020.1795822

Tennant, A., & Conaghan, P. G. (2007). The Rasch measurement model in rheumatology: What is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care & Research, 57 (8), 1358-1362. https://doi.org.10.1002/art.23108

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97–116. https://doi.org/10.1111/j.1745-3984.1977.tb00031.x

Wright, B. D., & Douglas, G. A. (1977). Best procedures for sample-free item analysis. Applied Psychological Measurement, 1(2), 281–295. https://doi.org/10.1177/014662167700100216

Wright, B. D., & Stone, M. H. (1979). Best test design. MESA Press.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. MESA Press.

Wright, B. D., & Stone, M. (1999). Measurement essentials (2nd ed.). Wide Range, Inc.


Full Text: PDF

DOI: 10.15408/jp3i.v13i1.27975

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Rahmat S Bintang

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.