Using K-NN Algorithm for Evaluating Feature Selection on High Dimensional Datasets

Fina Indri Silfana, Mula Agung Barata

Abstract


Data mining is the process of using statistics, mathematics, artificial intelligence and machine learning to identify problems that exist in data so as to produce useful information. Based on its function, data mining is grouped into description, estimation, classification, clustering, and association. K-NN is one of the best data mining methods and is widely used in research. K-NN algorithm was introduced by Fix and Hodges in 1951. K-NN algorithm is a simple algorithm and is often used to cluster supervised data. Feature selection attribute selection is a data mining technique used in the pre-processing stage. This technique works by reducing complex attributes that will be managed at the processing and analysis stage. In this study, the most effective feature selection to improve the accuracy of the K-NN algorithm by increasing accuracy by 95.12% on the breast cancer dataset and 88.75% on the prostate cancer dataset.


Keywords


Data mining; Classification; Feature Selection

Full Text:

PDF

References


J. Han, M. Kamber, and J. Pei, Techniques to Improve Classification Accuracy. 2012.

F. T. Admojo and Ahsanawati, “Klasifikasi Aroma Alkohol Menggunakan Metode KNN,” Indones. J. Data Sci., vol. 1, no. 2, pp. 34–38, 2020, doi: 10.33096/ijodas.v1i2.12.

M. Bennasar, Y. Hicks, and R. Setchi, “Feature selection using Joint Mutual Information Maximisation,” Expert Syst. Appl., vol. 42, no. 22, pp. 8520–8532, 2015, doi: 10.1016/j.eswa.2015.07.007.

J. Suntoro and C. N. Indah, “Average Weight Information Gain Untuk Menangani Data Berdimensi,” J. Buana Inform., vol. 8, pp. 131–140, 2017.

R. S. Wahono, N. Suryana, and S. Ahmad, “Metaheuristic Optimization based Feature Selection for Software Defect Prediction,” J. Softw., vol. 9, no. 5, 2014, doi: 10.4304/jsw.9.5.1324-1333.

A. Bengnga and R. Ishak, “Implementasi Seleksi Fitur Klasifikasi Waktu Kelulusan Mahasiswa Menggunakan Correlation Matrix with Heatmap,” Jambura J. Electr. Electron. Eng., vol. 4, no. 2, pp. 169–174, 2022, doi: 10.37905/jjeee.v4i2.14403.

A. Rifa’i, J. Suntoro, and G. G. Setiaji, “GA-SVM Wrapper Feature Selection untuk Penanganan Data Berdimensi Tinggi,” J. Transform., vol. 21, no. 2, p. 64, 2024, doi: 10.26623/transformatika.v21i2.8886.

L. W. Astuti, I. Saluza, F. Faradilla, and M. F. Alie, “Optimalisasi Klasifikasi Kanker Payudara Menggunakan Forward Selection pada Naive Bayes,” J. Ilm. Inform. Glob., vol. 11, no. 2, 2021, doi: 10.36982/jiig.v11i2.1235.

T. Ernayanti, M. Mustafid, A. Rusgiyono, and A. R. Hakim, “Penggunaan Seleksi Fitur Chi-Square Dan Algoritma Multinomial Naïve Bayes Untuk Analisis Sentimen Pelangggan Tokopedia,” J. Gaussian, vol. 11, no. 4, pp. 562–571, 2023, doi: 10.14710/j.gauss.11.4.562-571.

I. A. Angreni, S. A. Adisasmita, M. I. Ramli, and S. Hamid, “Pengaruh Nilai K Pada Metode K-Nearest Neighbor (Knn) Terhadap Tingkat Akurasi Identifikasi Kerusakan Jalan,” Rekayasa Sipil, vol. 7, no. 2, p. 63, 2019, doi: 10.22441/jrs.2018.v07.i2.01.

R. Sanjaya and F. Fitriyani, “Prediksi Bedah Toraks Menggunakan Seleksi Fitur Forward Selection dan K-Nearest Neighbor,” J. Edukasi dan Penelit. Inform., vol. 5, no. 3, p. 316, 2019, doi: 10.26418/jp.v5i3.35324.

M. A. D. A. F. Rismiati, “SATIN – Sains dan Teknologi Informasi Ukuran Akurasi Klasifikasi Penyakit Mesothelioma Menggunakan Algoritma K-Nearest Neighbor dan Backward Elimination Maxsi Ary,” vol. 5, no. 1, 2019.

H. Saleh, “K-Nearest Neighbor Berbasis Seleksi Atribut Chi Square Untuk Klasifikasi Penerima Beasiswa,” Simetris J. Tek. Mesin, Elektro dan Ilmu Komput., vol. 14, no. 1, pp. 1–10, 2023, doi: 10.24176/simet.v14i1.9178.

M. A. Barata, Edi Noersasongko, Purwanto, and Moch Arief Soeleman, “Improving the Accuracy of C4.5 Algorithm with Chi-Square Method on Pure Tea Classification Using Electronic Nose,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 7, no. 2, pp. 226–235, 2023, doi: 10.29207/resti.v7i2.4687.

T. A. Setiawan and M. A. A. Karomi, “Penerapan Metode Sample Bootstrapping untuk Meningkatkan Performa kNearest Neighbor pada Dataset Berdimensi Tinggi,” J. STMIK IC-Tech, vol. XII, no. 1, pp. 9–14, 2017, [Online]. Available: http://jurnal.stmik-wp.ac.id

E. K. Garcia, S. Feldman, M. R. Gupta, and S. Srivastava, “Completely lazy learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 9, pp. 1274–1285, 2010, doi: 10.1109/TKDE.2009.159.

A. Riski, “Analisis Komparasi Algoritma Klasifikasi Data Mining Untuk Prediksi Penderita Penyakit Jantung,” J. Tek. Inform. Kaputama, vol. 3, no. 1, pp. 22–28, 2019, [Online]. Available: https://jurnal.kaputama.ac.id/index.php/JTIK/article/view/141/156

I. made B. Adnyana, “Penerapan Feature Selection untuk Prediksi Lama Studi Mahasiswa,” J. Sist. Dan Inform., vol. 13, pp. 72–76, 2019.

D. S. Ramadhansyah, “Perbandingan Metode Seleksi Fitur Filter, Wrapper, dan Embedded Prediksi Kandungan Vitamin C Pada Buah Mangga Meggunakan Metode Linear Regression dan Random Forest Regression [Skripsi],” vol. Yogyakarta, p. Universitas Islam Indonesia, 2022.

E. Nurlia and U. Enri, “Penerapan Fitur Seleksi Forward Selection Untuk Menentukan Kematian Akibat Gagal Jantung Menggunakan Algoritma C4.5,” J. Tek. Inform. Musirawas) Elin Nurlia, vol. 6, no. 1, p. 42, 2021.

H. Harafani and H. A. Al-Kautsar, “Meningkatkan Kinerja K-Nn Untuk Klasifikasi Kanker Payudara Dengan Forward Selection,” J. Pendidik. Teknol. dan Kejuru., vol. 18, no. 1, p. 9 9, 2021,




DOI: https://doi.org/10.15408/jti.v17i2.40866 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Fina Indri Silfana, Mula Agung Barata

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@apps.uinjkt.ac.id


Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.

JTI Visitor Counter: View JTI Stats

 Flag Counter