Using K-NN Algorithm for Evaluating Feature Selection on High Dimensional Datasets
DOI:
https://doi.org/10.15408/jti.v17i2.40866Keywords:
Data mining, Classification, Feature SelectionAbstract
Data mining is the process of using statistics, mathematics, artificial intelligence and machine learning to identify problems that exist in data so as to produce useful information. Based on its function, data mining is grouped into description, estimation, classification, clustering, and association. K-NN is one of the best data mining methods and is widely used in research. K-NN algorithm was introduced by Fix and Hodges in 1951. K-NN algorithm is a simple algorithm and is often used to cluster supervised data. Feature selection attribute selection is a data mining technique used in the pre-processing stage. This technique works by reducing complex attributes that will be managed at the processing and analysis stage. In this study, the most effective feature selection to improve the accuracy of the K-NN algorithm by increasing accuracy by 95.12% on the breast cancer dataset and 88.75% on the prostate cancer dataset.
References
J. Han, M. Kamber, and J. Pei, Techniques to Improve Classification Accuracy. 2012.
F. T. Admojo and Ahsanawati, “Klasifikasi Aroma Alkohol Menggunakan Metode KNN,” Indones. J. Data Sci., vol. 1, no. 2, pp. 34–38, 2020, doi: 10.33096/ijodas.v1i2.12.
M. Bennasar, Y. Hicks, and R. Setchi, “Feature selection using Joint Mutual Information Maximisation,” Expert Syst. Appl., vol. 42, no. 22, pp. 8520–8532, 2015, doi: 10.1016/j.eswa.2015.07.007.
J. Suntoro and C. N. Indah, “Average Weight Information Gain Untuk Menangani Data Berdimensi,” J. Buana Inform., vol. 8, pp. 131–140, 2017.
R. S. Wahono, N. Suryana, and S. Ahmad, “Metaheuristic Optimization based Feature Selection for Software Defect Prediction,” J. Softw., vol. 9, no. 5, 2014, doi: 10.4304/jsw.9.5.1324-1333.
A. Bengnga and R. Ishak, “Implementasi Seleksi Fitur Klasifikasi Waktu Kelulusan Mahasiswa Menggunakan Correlation Matrix with Heatmap,” Jambura J. Electr. Electron. Eng., vol. 4, no. 2, pp. 169–174, 2022, doi: 10.37905/jjeee.v4i2.14403.
A. Rifa’i, J. Suntoro, and G. G. Setiaji, “GA-SVM Wrapper Feature Selection untuk Penanganan Data Berdimensi Tinggi,” J. Transform., vol. 21, no. 2, p. 64, 2024, doi: 10.26623/transformatika.v21i2.8886.
L. W. Astuti, I. Saluza, F. Faradilla, and M. F. Alie, “Optimalisasi Klasifikasi Kanker Payudara Menggunakan Forward Selection pada Naive Bayes,” J. Ilm. Inform. Glob., vol. 11, no. 2, 2021, doi: 10.36982/jiig.v11i2.1235.
T. Ernayanti, M. Mustafid, A. Rusgiyono, and A. R. Hakim, “Penggunaan Seleksi Fitur Chi-Square Dan Algoritma Multinomial Naïve Bayes Untuk Analisis Sentimen Pelangggan Tokopedia,” J. Gaussian, vol. 11, no. 4, pp. 562–571, 2023, doi: 10.14710/j.gauss.11.4.562-571.
I. A. Angreni, S. A. Adisasmita, M. I. Ramli, and S. Hamid, “Pengaruh Nilai K Pada Metode K-Nearest Neighbor (Knn) Terhadap Tingkat Akurasi Identifikasi Kerusakan Jalan,” Rekayasa Sipil, vol. 7, no. 2, p. 63, 2019, doi: 10.22441/jrs.2018.v07.i2.01.
R. Sanjaya and F. Fitriyani, “Prediksi Bedah Toraks Menggunakan Seleksi Fitur Forward Selection dan K-Nearest Neighbor,” J. Edukasi dan Penelit. Inform., vol. 5, no. 3, p. 316, 2019, doi: 10.26418/jp.v5i3.35324.
M. A. D. A. F. Rismiati, “SATIN – Sains dan Teknologi Informasi Ukuran Akurasi Klasifikasi Penyakit Mesothelioma Menggunakan Algoritma K-Nearest Neighbor dan Backward Elimination Maxsi Ary,” vol. 5, no. 1, 2019.
H. Saleh, “K-Nearest Neighbor Berbasis Seleksi Atribut Chi Square Untuk Klasifikasi Penerima Beasiswa,” Simetris J. Tek. Mesin, Elektro dan Ilmu Komput., vol. 14, no. 1, pp. 1–10, 2023, doi: 10.24176/simet.v14i1.9178.
M. A. Barata, Edi Noersasongko, Purwanto, and Moch Arief Soeleman, “Improving the Accuracy of C4.5 Algorithm with Chi-Square Method on Pure Tea Classification Using Electronic Nose,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 7, no. 2, pp. 226–235, 2023, doi: 10.29207/resti.v7i2.4687.
T. A. Setiawan and M. A. A. Karomi, “Penerapan Metode Sample Bootstrapping untuk Meningkatkan Performa kNearest Neighbor pada Dataset Berdimensi Tinggi,” J. STMIK IC-Tech, vol. XII, no. 1, pp. 9–14, 2017, [Online]. Available: http://jurnal.stmik-wp.ac.id
E. K. Garcia, S. Feldman, M. R. Gupta, and S. Srivastava, “Completely lazy learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 9, pp. 1274–1285, 2010, doi: 10.1109/TKDE.2009.159.
A. Riski, “Analisis Komparasi Algoritma Klasifikasi Data Mining Untuk Prediksi Penderita Penyakit Jantung,” J. Tek. Inform. Kaputama, vol. 3, no. 1, pp. 22–28, 2019, [Online]. Available: https://jurnal.kaputama.ac.id/index.php/JTIK/article/view/141/156
I. made B. Adnyana, “Penerapan Feature Selection untuk Prediksi Lama Studi Mahasiswa,” J. Sist. Dan Inform., vol. 13, pp. 72–76, 2019.
D. S. Ramadhansyah, “Perbandingan Metode Seleksi Fitur Filter, Wrapper, dan Embedded Prediksi Kandungan Vitamin C Pada Buah Mangga Meggunakan Metode Linear Regression dan Random Forest Regression [Skripsi],” vol. Yogyakarta, p. Universitas Islam Indonesia, 2022.
E. Nurlia and U. Enri, “Penerapan Fitur Seleksi Forward Selection Untuk Menentukan Kematian Akibat Gagal Jantung Menggunakan Algoritma C4.5,” J. Tek. Inform. Musirawas) Elin Nurlia, vol. 6, no. 1, p. 42, 2021.
H. Harafani and H. A. Al-Kautsar, “Meningkatkan Kinerja K-Nn Untuk Klasifikasi Kanker Payudara Dengan Forward Selection,” J. Pendidik. Teknol. dan Kejuru., vol. 18, no. 1, p. 9 9, 2021,








