Outlier Detection in Inpatient Claims Using DBSCAN and K-Means

Panca Oktavia Candra Sari, Suharjito Suharjito


Health insurance helps people to obtain quality and affordable health services. The claim billing process is manually input code to the system, this can lack of errors and be suspected of being fraudulent. Claims suspected of fraud are traced manually to find incorrect inputs. The increasing volume of claims causes a decrease in the accuracy of tracing claims suspected of fraud and consumes time and energy. As an effort to prevent and reduce the occurrence of fraud, this study aims to determine the pattern of data on the occurrence of fraud based on the formation of data groupings. Data was prepared by combining claims for inpatient bills and patient bills from hospitals in 2020. Two methods were used in this study to form clusters, DBSCAN and KMeans. To find out the outliers in the cluster, Local Outlier Factor (LOF) was added. The results from experiments show that both methods can detect outlier data and distribute outlier data in the formed cluster. Variable that high effect becomes data outlier is the length of stay, claims code, and condition of patient when discharged from the hospital. Accuracy K-Means is 0.391, 0.003 higher than DBSCAN, which is 0.389.


Fraud; Data Mining; DBSCAN; K-Means; Outlier

Full Text:



Y. B. Sarwo, “Tinjauan Yuridis Terhadap Kecurangan (Frauds) Dalam Industri Asuransi Kesehatan Di Indonesia,” J. Kisi Huk. Unika, vol. 14, no. 1, pp. 1–15, 2015.

R. A. Sowah et al., “Decision Support System (DSS) for Fraud Detection in Health Insurance Claims Using Genetic Support Vector Machines (GSVMs),” J. Eng. (United Kingdom), vol. 2019, no. January 2007, 2019, doi: 10.1155/2019/1432597.

J. Liu et al., “Graph analysis for detecting fraud, waste, and abuse in health-care data,” AI Mag., vol. 37, no. 2, pp. 33–46, 2016, doi: 10.1609/aimag.v37i2.2630.

W. Zhang and X. He, “An Anomaly Detection Method for Medicare Fraud Detection,” Proc. - 2017 IEEE Int. Conf. Big Knowledge, ICBK 2017, pp. 309–314, 2017, doi: 10.1109/ICBK.2017.47.

N. Ghuse, P. Pawar, and A. Potgantwar, “An Improved Approch For Fraud Detection In Health Insurance Using Data Mining Techniques,” no. 5, pp. 27–32, 2017, [Online]. Available: www.ijsrnsc.orgAvailableonlineatwww.ijsrnsc.org.

M. S. Anbarasi and S. Dhivya, “Fraud detection using outlier predictor in health insurance data,” 2017 Int. Conf. Inf. Commun. Embed. Syst. ICICES 2017, no. Icices, 2017, doi: 10.1109/ICICES.2017.8070750.

R. A. Bauder and T. M. Khoshgoftaar, Multivariate outlier detection in medicare claims payments applying probabilistic programming methods, vol. 17, no. 3–4. Springer US, 2017.

Y. Gao, C. Sun, R. Li, Q. Li, L. Cui, and B. Gong, “An Efficient Fraud Identification Method Combining Manifold Learning and Outliers Detection in Mobile Healthcare Services,” IEEE Access, vol. 6, no. c, pp. 60059–60068, 2018, doi: 10.1109/ACCESS.2018.2875516.

A. Verma, A. Taneja, and A. Arora, “Fraud detection and frequent pattern matching in insurance claims using data mining techniques,” 2017 10th Int. Conf. Contemp. Comput. IC3 2017, vol. 2018-Janua, no. August, pp. 1–7, 2018, doi: 10.1109/IC3.2017.8284299.

S. M. Palacio, “Abnormal pattern prediction: Detecting fraudulent insurance property claims with semi-supervised machine-learning,” Data Sci. J., vol. 18, no. 1, pp. 1–15, 2019, doi: 10.5334/dsj-2019-035.

R. Kunickaitė, M. Zdanavičiūtė, and T. Krilavičius, “Fraud detection in health insurance using ensemble learning methods,” CEUR Workshop Proc., vol. 2698, 2020.

R. T. Vulandari, Data Mining Teori Dan Aplikasi Rapidminer, I. Yogyakarta: Gava Media, 2017.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts And Techniques, Third. Morgan Kaufmann, 2011.

S. Kareem, R. B. Ahmad, and A. B. Sarlan, “Framework for the identification of fraudulent health insurance claims using association rule mining,” 2017 IEEE Conf. Big Data Anal. ICBDA 2017, vol. 2018-Janua, pp. 99–104, 2018, doi: 10.1109/ICBDAA.2017.8284114.

C. C. Aggarwal, Outlier Analysis, Second., vol. 24, no. 2. Yorktown Heights, New York: Springer US, 2017.

R. Bansal, N. Gaur, and S. N. Singh, “Outlier Detection: Applications and techniques in Data Mining,” Proc. 2016 6th Int. Conf. - Cloud Syst. Big Data Eng. Conflu. 2016, pp. 373–377, 2016, doi: 10.1109/CONFLUENCE.2016.7508146.

A. R. Ajiboye, A. G. Akintola, and A. O. Ameen, “Anomaly Detection in Dataset for Improved Model Accuracy Using DBSCAN Clustering Algorithm,” African J. Comput. ICT, vol. 8, no. 1, pp. 39–46, 2015.

A. M. Khalimi, “Perhitungan Confusion Matrix Multi-Class Clasification 3x3,” 2020. https://www.pengalaman-edukasi.com/2020/11/menghitung-confusion-matrix-3-kelas.html (accessed Dec. 10, 2021).

V. Mallawaarachchi, “Evaluating Clustering Results,” 2020. https://towardsdatascience.com/evaluating-clustering-results-f13552ee7603 (accessed Dec. 10, 2021).

T. Wahyono, Fundamental Of Python For Machine Learning, I. Gava Media, 2018.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “Density-Based Clustering Methods,” Compr. Chemom., vol. 2, pp. 635–654, 2009, doi: 10.1016/B978-044452701-1.00067-3.

P. Patel, S. Mal, and Y. Mhaske, “A Survey Paper on Fraud Detection and Frequent Pattern Matching in Insurance claims using Data Mining Techniques,” Int. Res. J. Eng. Technol., vol. 6, no. 1, pp. 591–594, 2019, [Online]. Available: http://www.academia.edu/download/58335030/IRJET-V6I1104.pdf.

M. Mughnyanti, S. Efendi, and M. Zarlis, “Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation,” IOP Conf. Ser. Mater. Sci. Eng., vol. 725, no. 1, 2020, doi: 10.1088/1757-899X/725/1/012128.

DOI: https://doi.org/10.15408/jti.v15i1.25682 Abstract - 0 PDF - 0


  • There are currently no refbacks.

Copyright (c) 2022 Panca Oktavia Candra Sari, Suharjito Suharjito

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur. 
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@uinjkt.ac.id

Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.


JTI Visitor Counter: View JTI Stats

 Flag Counter