Evaluating ChatGPT’s Accuracy Across Cognitive Levels in Academic Assessments

Astutiati Nurhasanah, Fadhilah Suralaga, Ida Rosyidah, Zahrotun Nihayah, Riri Fitri Sari, Ade Solihat, Nabila Ernada

Abstract


Abstract

This study evaluates the accuracy of ChatGPT’s free version in answering academic questions based on Bloom’s Taxonomy cognitive levels (C1–C6) and disciplines (physics, social sciences, and religious studies) at two universities in Jakarta. A mixed-method approach was used, combining statistical and content analyses. Thirty-five lecturers from UIN Jakarta and the University of Indonesia submitted exam questions in Bahasa Indonesia to ChatGPT, and the responses were scored on a 0–100 accuracy scale. Results show that ChatGPT performs well on multiple-choice questions (C1–C3) in physics but struggles with higher-order tasks (C5–C6) requiring synthesis, evaluation, and creativity. In social sciences, accuracy was consistent, particularly in theoretical questions, though ChatGPT faced challenges with data-driven analysis and practical application. Religious studies exhibited high accuracy across all cognitive levels due to the structured and doctrinal nature of the material.Statistical analysis revealed significant differences in accuracy between lower and higher cognitive levels in physics (p = 0.005) and religious studies (p = 0.011), but no significant difference in social sciences (p = 0.137). ANOVA (p = 0.464) showed no significant differences across disciplines. This study highlights ChatGPT’s effectiveness in answering lower to intermediate-level questions (C1–C4) but identifies limitations with higher-level tasks (C5–C6). These findings encourage educators to design questions that assess deeper cognitive skills while utilizing AI’s strengths in supporting learning and knowledge acquisition.

Abstrak

Studi ini mengevaluasi akurasi versi gratis ChatGPT dalam menjawab pertanyaan akademik berdasarkan tingkat kognitif Taksonomi Bloom (C1–C6) dan disiplin ilmu (fisika, ilmu sosial, dan studi keagamaan) di dua universitas di Jakarta. Pendekatan mixed-method digunakan, menggabungkan analisis statistik dan konten. Sebanyak 35 dosen dari UIN Jakarta dan Universitas Indonesia mengajukan soal ujian dalam Bahasa Indonesia ke ChatGPT, dan jawaban yang dihasilkan dinilai pada skala akurasi 0–100. Hasil penelitian menunjukkan bahwa ChatGPT unggul pada soal pilihan ganda (C1–C3) di bidang fisika, tetapi kesulitan pada tugas tingkat tinggi (C5–C6) yang membutuhkan sintesis, evaluasi, dan kreativitas. Pada ilmu sosial, akurasi cenderung konsisten, terutama pada soal teoretis, meskipun ChatGPT menghadapi tantangan dalam analisis berbasis data dan penerapan praktis. Pada studi agama, ChatGPT menunjukkan akurasi tinggi di semua tingkat kognitif karena struktur materi dan interpretasi doktrin yang jelas. Analisis statistik menunjukkan perbedaan signifikan pada akurasi antara tingkat kognitif rendah dan tinggi di fisika (p = 0,005) dan studi agama (p = 0,011), tetapi tidak pada ilmu sosial (p = 0,137). Hasil ANOVA (p = 0,464) menunjukkan tidak ada perbedaan signifikan antar disiplin ilmu secara keseluruhan. Studi ini menyoroti efektivitas ChatGPT dalam menjawab soal tingkat rendah hingga menengah (C1–C4) tetapi mengidentifikasi keterbatasan pada tugas tingkat tinggi (C5–C6). Temuan ini mendorong pendidik untuk merancang soal yang mengukur keterampilan kognitif mendalam sambil memanfaatkan kekuatan AI dalam mendukung pembelajaran dan akuisisi pengetahuan.

How to Cite: Nurhasanah, A., Suralaga, F., Rosyidah, I., Nihayah, Z., Sari, R. F., Solihat, A., & Ernada, N. (2024). Evaluating ChatGPT’s Accuracy Across Cognitive Levels in Academic Assessments. TARBIYA: Journal of Education in Muslim Society, 11(2), 211-224. https://doi.org/10.15408/tjems.v11i2.44701


Keywords


ChatGPT; Bloom's Taxonomy; AI in education; cognitive skills; academic assessment; ChatGPT; Taksonomi Bloom; AI dalam pendidikan; keterampilan kognitif; penilaian akademik

Full Text:

PDF

References


Anderson, L.W. & Krathwohl, D.R. (Ed.) (2001). A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. New York: Addison Wesley Longman, Inc.

Arif, T. Bin, Munaf, U., & Ul-Haque, I. (2023). The future of medical education and research: Is ChatGPT a blessing or blight in disguise?. Medical Education Online, 28(1). https://doi.org/10.1080/10872981.2023.2181052 .

Armstrong, P. (2010). Bloom’s Taxonomy. Vanderbilt University Center for Teaching. https://cft.vanderbilt.edu/guides-sub-pages/blooms-taxonomy/ .

Bom, H.-S. H. (2023). Exploring the Opportunities and Challenges of ChatGPT in Academic Writing: a Roundtable Discussion. Nuclear Medicine and Molecular Imaging, 57(4), 165–167. https://doi.org/10.1007/s13139-023-00809-2 .

Chang, C.-Y., Chen, I.-H., & Tang, K.-Y. (2024). Roles and research trends of ChatGPT-based learning: A bibliometric analysis and systematic review. Educational Technology & Society, 27(4), 471-486. https://doi.org/10.30191/ETS.202410_27(4).TP03 .

Cong-Lem, N., Soyoof, A., & Tsering, D. (2024). A Systematic Review of the Limitations and Associated Opportunities of ChatGPT. International Journal of Human–Computer Interaction, 1–16. https://doi.org/10.1080/10447318.2024.2344142.

Dwiyono, Y., Wahyudi N., dan Tannarong Y. (2024). Pemanfaatan Chat GPT, Canva, dan Media Pembelajaran Interaktif untuk Peningkatan Kompetensi Pendidik. COMMUNIO: Jurnal Pengadian kepada Masyarakat, 2(1), 14-18. https://jurnal.litnuspublisher.com/index.php/jpkm/article/view/226.

Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics. London: Sage Publications.

Hair, J.F., Black, W.C., Babin, B. J., & Anderson, R. E. (2019). Multivariate Data Analysis. Boston: Cengage Learning.

Koteluk, O., Wartecki, A., Mazurek, S., Kołodziejczak, I., & Mackiewicz, A. (2021). How Do Machines Learn? Artificial Intelligence as a New Era in Medicine. Journal of Personalized Medicine, 11(1), 32. https://doi.org/10.3390/jpm11010032.

Krippendorff, K. (2004). Content Analysis: An Introduction to Its Methodology. Thousand Oaks, CA: Sage Publications.

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health, 2(2). https://doi.org/10.1371/journal.pdig.0000198 .

Lestari, S. (2024). The Implementation of ChatGPT-based Learning for Higher Education in Indonesia: Systematic Literature Review, ELS Journal on Interdisciplinary Studies in Humanities, 7(2): June, 339-347. https://doi.org/10.34050/elsjish.v7i2.35265 .

Lingard, L. (2023). Writing with ChatGPT: An Illustration of its Capacity, Limitations & Implications for Academic Writers. Perspectives on Medical Education, 12(1), 261–270. https://doi.org/10.5334/pme.1072.

Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410.

Marais, E., Marais-Botha, R. & Coertzen, F. (2024). Constructing an artificial- intelligence higher education environment: Guidelines for the future. In L. Wood & O. Zuber-Skerritt (Eds.), Shaping the future of higher educa- tion: Positive and sustainable frameworks for navigating constant change, 173–192. Helsinki University Press. https://doi.org/10.33134- HUP-25-9 .

Nabawi, I. H., Febrina, Y., Pramono, H. E., Sutiarsih, Purwaningsih, H., Andini, R., Surawan, E., Budi, R. S., Afidhan, S., Rohanah, A., & Alfian, A. W. (2022). Petunjuk Teknis Layanan ISBN Perpustakaan Nasional RI (B. N. I. (BNI) dan K. I. N. (KIN), Koordinator Pengembangan dan Pengawasan Bibliografi, Ed.). Perpustakaan Nasional RI. https://isbn.perpusnas.go.id/docsurat/Petunjuk%20Teknis%20Layanan%20ISBN%20-%2020230127.pdf

Ningrum, A.R., Saputra, B.A., Mahardika, Y., Sari, N.P. (2024). Analisis Penerapan ChatGPT sebagai Alat Bantu Akademik dalam Meningkatkan Efisiensi dan Kreativitas Mahasiswa. Seminar Nasional Amikom Surakarta (Semnasa) 2024.

O’Connor, S., & ChatGPT. (2023). Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse? Nurse Education in Practice, 66, 103537. https://doi.org/10.1016/j.nepr.2022.103537.

OpenAI. (2022, November 30). Introducing ChatGPT.

Rahman, M., Terano, H. J. R., Rahman, N., Salamzadeh, A., & Rahaman, S. (2023). ChatGPT and Academic Research: A Review and Recommendations Based on Practical Examples. Journal of Education, Management and Development Studies, 3(1), 1–12. https://doi.org/10.52631/jemds.v3i1.175.

Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003.

Sallam, M. (2023). ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare, 11(6), 887. https://doi.org/10.3390/healthcare11060887.

Suárez, A., García, V.D-F., Algar J., Sánchez, M.G., de Pedro M.L., Freire, Y. (2023). Unveiling theChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. International Endodontic Journal, 57, 108–113. https://doi.org/10.1111/iej.13985.

Suharmawan, W. Pemanfatan Chat GPT dalam Dunia Pendidikan. Education Journal: Journal Education Research and Development, 7(2), Agustus 2023. https://doi.org/10.31537/ej.v7i2.1248.

Teixeira da Silva, J. A. (2023). Is ChatGPT a valid author?. Nurse Education in Practice, 68. https://doi.org/10.1016/j.nepr.2023.103600 .

Tu, Y.-F. (2024). Roles and functionalities of ChatGPT for students with different growth mindsets: Findings of drawing analysis. Educational Technology & Society, 27(1), 198-214. https://doi.org/10.30191/ETS.202401_27(1).TP01.

Yu, H. (2024). The application and challenges of ChatGPT in educational transformation: New demands for teachers’ roles. Heliyon, 10(2), e24289. https://doi.org/10.1016/j.heliyon.2024.e24289.




DOI: https://doi.org/10.15408/tjems.v11i2.44701 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).

TARBIYA: Journal of Education in Muslim Society, p-ISSN: 2356-1416, e-ISSN: 2442-9848

View My Stats