Comparing IRT Models: Summated Scaling Effects on Critical Thinking in Vocational Students
DOI:
https://doi.org/10.15408/jp3i.v14i2.42886Keywords:
Critical Thinking, Summated Rating, Item Response TheoryAbstract
This study investigates the comparative efficacy of Summated Rating Scales (SRS) and traditional
ordinal scales (raw Likert-type responses) in measuring critical thinking skills among vocational
students, employing Item Response Theory (IRT) to evaluate their psychometric properties.
Addressing the limitations of ordinal scales notably inconsistent intervals between response
categories the research adopts a descriptive quantitative methodology involving 269 students from
state vocational high schools in Yogyakarta, Indonesia. Data were collected using a five-point
Likert scale instrument, validated for content (Aiken’s V = 0.94), and analyzed through two IRT
frameworks: Polytomous IRT for unscaled ordinal data and Continuous Response Model (CRM)
IRT for SRS-transformed interval data. Key findings reveal that SRS enhances measurement
precision by normalizing response distributions into proportional intervals (e.g., recalibrated scores:
0.00, 0.73, 1.46, 2.07, 2.84), thereby resolving issues of unequal category spacing inherent to
ordinal scales. Polytomous IRT demonstrated robust item fit (e.g., Partial Credit Model fit for 5/6
items) and strong difficulty parameter invariance (r = 0.84), yet exhibited instability in ability
estimates (r = 0.37) due to extreme response patterns. Conversely, CRM IRT applied to scaled
data produced stable ability estimates (r = 0.46) and eliminated infinite values in Maximum
Likelihood Estimation, underscoring its superiority in handling continuous metrics. However, ordinal
scales retained higher consistency in difficulty calibration across subgroups. The study concludes
that integrating SRS with CRM IRT offers a refined approach for critical thinking assessments,
balancing precision and fairness, while ordinal scales remain pragmatic for contexts prioritizing
simplicity. These insights advocate for the adoption of advanced scaling techniques in vocational
education to improve the validity of competency evaluations, with recommendations for future
research to explore hybrid models and longitudinal applications.
References
Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings, Educational and Psychological Measurument. Journal Articles; Reports - Research; Numerical/Quantitative Data, 45(1), 131–142. https://doi.org/https://doi.org/10.1177/0013164485451012
Akour, I. A., Al-Maroof, R. S., Alfaisal, R., & Salloum, S. A. (2022). A conceptual framework for determining metaverse adoption in higher institutions of gulf area: An empirical study using hybrid SEM-ANN approach. Computers and Education: Artificial Intelligence, 3(January), 100052. https://doi.org/10.1016/j.caeai.2022.100052
Alamrani, S., Gardner, A., Falla, D., Russell, E., Rushton, A. B., & Heneghan, N. R. (2023). Content validity of the Scoliosis Research Society questionnaire (SRS-22r): A qualitative concept elicitation study. PLoS ONE, 18(5 May), 1–21. https://doi.org/10.1371/journal.pone.0285538
Ali, U. S., Chang, H., & Anderson, C. J. (2015). Location indices for ordinal polytomous items based on item response theory. In ETS Research Report Series (Vol. 2015, Issue 2). https://doi.org/10.1002/ets2.12065
Alordiah, C. O., & Oji, J. (2024). Test Equating in Educational Assessment : A Comprehensive Framework for Promoting Fairness , Validity , and Cross- Cultural Equity. Asian Journal of Assessment in Teaching and Learning, 14(1), 70–84. https://doi.org/10.37134/ajatel.vol14.1.7.2024
Astuti, N. D., Hajaroh, M., Prihatni, Y., Setiawan, A., Setiawati, F. A., & Retnawati, H. (2024). Comparison of KMO results, eigen value, reliability, and standard error of measurement: Original & rescaling through summated rating scaling. Jurnal Pengukuran Psikologi Dan Pendidikan Indonesia, 13(2), 199–217. https://doi.org/10.15408/jp3i.v13i2.36684
Baker, M., Lu, P., & Lamm, A. (2021). Assessing the dimensional validity and reliability of the university of florida critical thinking inventory (UFCTI) in chinese: A confirmatory factor analysis. Journal of International Agricultural and Extension Education, 28(3), 41–56. https://doi.org/10.5191/jiaee.2021.28341
Bean, G. J., & Bowen, N. K. (2021). Item response theory and confirmatory factor analysis: Complementary approaches for scale development. Journal of Evidence-Based Social Work (United States), 18(6), 597–618. https://doi.org/10.1080/26408066.2021.1906813
BSKAP Kemendikbudristek. (2022). Dimensi, Elemen, dan Subelemen Profil Pelajar Pancasila pada Kurikulum Merdeka. In Kemendikbudristek.
Casper, W. C., Edwards, B. D., Wallace, J. C., Landis, R. S., & Fife, D. A. (2020). Selecting response anchors with equal intervals for summated rating scales. Journal of Applied Psychology, 105(4), 390–409. https://doi.org/10.1037/apl0000444
Chalmers, R. P. (2012). mirt : A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6). https://doi.org/10.18637/jss.v048.i06
Dai, S., Vo, T. T., Kehinde, O. J., He, H., Xue, Y., Demir, C., & Wang, X. (2021). Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data. Frontiers in Education, 6(September), 1–18. https://doi.org/10.3389/feduc.2021.721963
Febriana, B. W., & Setiawati, F. A. (2024). Increasing measurement accuracy: Scaling effect on academic resilience instrument using Method of Successive Interval (MSI) and Method of Summated Rating Scale (MSRS). Jurnal Penelitian Dan Evaluasi Pendidikan, 28(1), 32–42. https://doi.org/10.21831/pep.v28i1.69334
Fialho, L., & Zyngier, S. (2023). Quantitative methodological approaches to stylistics. In The Routledge handbook of stylistics (2nd ed.). Routledge.
Guenther, P., Guenther, M., Ringle, C. M., Zaefarian, G., & Cartwright, S. (2023). Improving PLS-SEM use for business marketing research. Industrial Marketing Management, 111(April), 127–142. https://doi.org/10.1016/j.indmarman.2023.03.010
Hj. Ebil, S., Salleh, S. M., & Shahrill, M. (2020). The use of E-portfolio for self-reflection to promote learning: A case of TVET students. Education and Information Technologies, 25(6), 5797–5814. https://doi.org/10.1007/s10639-020-10248-7
Jebb, A. T., Ng, V., & Tay, L. (2021). A review of key likert scale development advances: 1995–2019. Frontiers in Psychology, 12(May), 1–14. https://doi.org/10.3389/fpsyg.2021.637547
Kadigi, R. M. J., Mgeni, C. P., Kangile, J. R., Aku, A. O. ati, & Kimaro, P. (2023). Can a legal game meat trade in Tanzania lead to reduced poaching? Perceptions of stakeholders in the wildlife industry. Journal for Nature Conservation, 76(March), 126502. https://doi.org/10.1016/j.jnc.2023.126502
Kadim, A., & Sunardi, N. (2021). Financial management system (QRIS) based on UTAUT model approach in Jabodetabek. International Journal of Artificial Intelligence Research, 6(1). https://doi.org/10.29099/ijair.v6i1.282
Kinel, E., Korbel, K., Kozinoga, M., Czaprowski, D., Stępniak, Ł., & Kotwicki, T. (2021). The measurement of health-related quality of life of girls with mild to moderate idiopathic scoliosis—comparison of isyqol versus srs-22 questionnaire. Journal of Clinical Medicine, 10(21). https://doi.org/10.3390/jcm10214806
Kusmaryono, I., Wijayanti, D., & Maharani, H. R. (2022). Number of response options, reliability, validity, and potential bias in the use of the likert scale education and social science research: A literature review. International Journal of Educational Methodology, 8(4), 625–637. https://doi.org/10.12973/ijem.8.4.625
Lindner, J. R., & Lindner, N. (2024). Interpreting Likert type, summated, unidimensional, and attitudinal scales: I neither agree nor disagree, Likert or not. Advancements in Agricultural Development, 5(2), 152–163. https://doi.org/10.37433/aad.v5i2.351
Mohamadi, Z. (2018). Comparative effect of online summative and formative assessment on EFL student writing ability. Studies in Educational Evaluation, 59, 29–40. https://doi.org/10.1016/j.stueduc.2018.02.003
Mustika, M., Maknun, J., & Feranie, S. (2019). Case study : Analysis of senior high school students scientific creative, critical thinking and its correlation with their scientific reasoning skills on the sound concept. Journal of Physics: Conference Series, 1157(3). https://doi.org/10.1088/1742-6596/1157/3/032057
Payan-Carreira, R., Sacau-Fontenla, A., Rebelo, H., Sebastião, L., & Pnevmatikos, D. (2022). Development and validation of a critical thinking assessment-scale short form. Education Sciences, 12(12). https://doi.org/10.3390/educsci12120938
Putranta, H., & Supahar, S. (2019). Development of Physics-Tier Tests (PysTT) to measure students’ conceptual understanding and creative thinking skills: A qualitative synthesis. Journal for the Education of Gifted Young Scientists, 7(3), 747–775. https://doi.org/10.17478/jegys.587203
Robitzsch, A. (2021). A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations, 1(1), 116–144. https://doi.org/10.3390/foundations1010009
Robitzsch, A., & Lüdtke, O. (2020). A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychological Test and Assessment Modeling, 62(2), 233–279. https://www.psychologie-aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-2/03_Robitzsch.pdf
Şad, S. N. (2020). Does difficulty-based item order matter in multiple-choice exams? (Empirical evidence from university students). Studies in Educational Evaluation, 64(September 2019), 100812. https://doi.org/10.1016/j.stueduc.2019.100812
Selçuk, E., & Demir, E. (2024). Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods. International Journal of Assessment Tools in Education, 11(2), 213–248. https://doi.org/10.21449/ijate.1290831
Shaw, A., Liu, O. L., Gu, L., Kardonova, E., Chirikov, I., Li, G., Hu, S., Yu, N., Ma, L., Guo, F., Su, Q., Shi, J., Shi, H., & Loyalka, P. (2020). Thinking critically about critical thinking: validating the Russian HEIghten® critical thinking assessment. Studies in Higher Education, 45(9), 1933–1948. https://doi.org/10.1080/03075079.2019.1672640
Sidel, J. L., Bleibaum, R. N., & Tao, K. W. C. (2018). Quantitative descriptive analysis. In S. E. Kemp, J. Hort, & T. Hollowood (Eds.), Descriptive analysis in sensory evaluation. John Wiley & Sons Ltd. https://doi.org/10.1002/9781118991657.ch8
Tobón, S., & Luna‐nemecio, J. (2021). Complex thinking and sustainable social development: Validity and reliability of the complex‐21 scale. Sustainability (Switzerland), 13(12), 1–19. https://doi.org/10.3390/su13126591
Tsikritsis, D., Legge, E. J., & Belsey, N. A. (2022). Practical considerations for quantitative and reproducible measurements with stimulated Raman scattering microscopy. Analyst, 147(21), 4642–4656. https://doi.org/10.1039/d2an00817c
Van Hauwaert, S. M., Schimpf, C. H., & Azevedo, F. (2020). The measurement of populist attitudes: Testing cross-national scales using item response theory. Politics, 40(1), 3–21. https://doi.org/10.1177/0263395719859306
Vollmer, F., & Alkire, S. (2022). Consolidating and improving the assets indicator in the global multidimensional poverty index. World Development, 158, 105997. https://doi.org/10.1016/j.worlddev.2022.105997
Zarate, D., Hobson, B. A., March, E., Griffiths, M. D., & Stavropoulos, V. (2023). Psychometric properties of the Bergen Social Media Addiction Scale: An analysis using item response theory. Addictive Behaviors Reports, 17(July 2022), 100473. https://doi.org/10.1016/j.abrep.2022.100473
Zou, G., Zou, L., & Qiu, S. fang. (2023). Parametric and nonparametric methods for confidence intervals and sample size planning for win probability in parallel-group randomized trials with likert item and likert scale data. Pharmaceutical Statistics, 22(3), 418–439. https://doi.org/10.1002/pst.2280
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Andi Abdurrahman Manggaberani, Abrar Syahrul Fajri, Heri Retnawati

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





