STEMMING BAHASA JAWA MENGGUNAKAN DAMERAU LEVENSHTEIN DISTANCE (DLD)

Aji Prasetya Wibawa, Muhammad Nu’man Hakim

Abstract


Stemming is one of the essential stages of text mining. This process removes prefixes and suffixes to produce root words in a text. This study uses a string matching algorithm, namely Damerau Levenshtein Distance (DLD), to find the basic word forms of Javanese. Test data of 300 words that have a prefix, insertion, suffix, a combination of prefix and suffix, and word repetition. The results of this study indicate that the Damerau Levenshtein Distance (DLD) algorithm can be used for Stemming Javanese text with an accuracy value of 49.6%.


Keywords


Basic words; Javanese; Damerau Levenshtein Distance

Full Text:

PDF

References


F. Amin, W. Hadikurniawati, S. Wibisono, H. Februariyanti, and J. S. Wibowo, “A hybrid method of rule-based and string matching stemmer for Javanese language,” J. Theor. Appl. Inf. Technol., vol. 95, no. 19, pp. 4973–4982, 2017.

F. Amin, Purwatiningtyas, P. Utomo, S. Ramadhanu, and S. E. Cahya, “Stemmer Bahasa Jawa Ngoko dengan Metode Affix Removal Stemmers (Rule Based Approach),” J. Din., vol. 21, no. 1, pp. 16–24, 2016.

C. D. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval. Cambridge university press, 2009.

F. Z. Tala, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia,” Universiteit van Amsterdam The Netherlands, 2003.

A. R. Kulkarni and S. D. Mundhe, “An Application of Porters Stemming Algorithm for Text Mining in Healthcare,” Int. J. Manag. IT Eng., vol. 7, no. 11, pp. 223–228, 2017.

L. Agusta, “Perbandingan Algoritma Stemming Porter Dengan Algoritma Nazief & Adriani Untuk Stemming Dokumen Teks Bahasa Indonesia,” in Konferensi Nasional Sistem dan Informatika, 2009, pp. 196–201.

B. V. Indriyono, E. Utami, and A. Sunyoto, “Pemanfaatan Algoritma Porter Stemmer Untuk Bahasa Indonesia Dalam Proses Klasifikasi Jenis Buku,” J. Buana Inform., vol. 6, no. 4, pp. 301–309, Oct. 2015.

M. Panda, “Developing an Efficient Text Pre-Processing Method with Sparse Generative Naive Bayes for Text Mining,” Int. J. Mod. Educ. Comput. Sci., vol. 10, no. 9, pp. 11–19, 2018.

A. Hegde and S. K. Shetty, “A Study on Stemming Algorithms,” Int. J. Emerg. Trends Sci. Technol., vol. 2, no. 5, pp. 2301–2364, 2015.

A. Schofield and D. Mimno, “Comparing Apples to Apple: The Effects of Stemmers on Topic Models,” Trans. Assoc. Comput. Linguist., vol. 4, pp. 287–300, Dec. 2016.

A. Ismailov, M. M. A. Jalil, Z. Abdullah, and N. H. A. Rahim, “A comparative study of Stemming algorithms for use with the Uzbek language,” in 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), 2016, pp. 7–12.

R. Elhassan and M. Ahmed, “Arabic text Stemming Effectiveness,” in 2016 Conference of Basic Sciences and Engineering Studies (SGCAC) Arabic, 2016, pp. 88–93.

A. Sharma, R. Kumar, and V. Mansotra, “Proposed Stemming Algorithm for Hindi Information Retrieval,” Int. J. Innov. Res. Comput. Commun. Eng. (An ISO Certif. Organ., vol. 3297, no. 6, pp. 11449–11455, 2016.

M. S. H. Simarangkir, “Studi Perbandingan Algoritma - Algoritma Stemming Untuk Dokumen Teks Bahasa Indonesia,” J. Inkofarall, vol. 1, no. 1, pp. 40–46, 2017.

A. P. Wibawa, F. A. Dwiyanto, I. A. E. Zaeni, R. K. Nurrohman, and A. Afandi, “Stemming javanese affix words using Nazief and Adriani modifications,” J. Inform., vol. 14, no. 1, p. 36, 2020.

N. Hidayatullah, A. P. Wibawa, and H. A. Rosyid, “Penerapan ECS Stemmer untuk Modifikasi Nazief & Adriani Berbahasa Jawa,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 3, no. 3, pp. 343–348, Dec. 2019.

R. Romadhianti, “Fenomena Bahasa Gaul dalam Kacamata Morfologis, Fonologis, dan Sintaksis,” PESONAJurnal Kaji. Bhs. dan Sastra Indones., vol. 5, no. 11–18, 2019.

F. J. Damerau, “A technique for computer detection and correction of spelling errors,” Commun. ACM, vol. 7, no. 3, pp. 171–176, 1964.

P. Santoso, P. Yuliawati, R. Shalahuddin, and A. P. Wibawa, “Damerau Levenshtein Distance for Indonesian Spelling Correction,” J. Inform., vol. 13, no. 2, p. 11, 2019.

A. Kutuzov, “Improving English-Russian sentence alignment through POS tagging and Damerau-Levenshtein distance,” in Proceedings of the 4th Biennial International Work, 2013, pp. 63–68.

P. Santoso, P. Yuliawati, R. Shalahuddin, and I. A. E. Zaeni, “Penghapusan kolom dan baris pertama pada matriks distance untuk optimasi spell checker damerau-levenshtein distance,” Sains, Apl. Komputasi dan Teknol. Inf., vol. 2, no. 2, pp. 57–63, 2020.

A. P. Wibawa, F. A. Dwiyanto, I. A. E. Zaeni, R. K. Nurrohman, and A. Afandi, “Stemming javanese affix words using Nazief and Adriani modifications,” J. Inform., vol. 14, no. 1, p. 36, Jan. 2020.

T. N. Maghfira, I. Cholissodin, and A. W. Widodo, “Deteksi Kesalahan Ejaan dan Penentuan Rekomendasi Koreksi Kata yang Tepat Pada Dokumen Jurnal JTIIK Menggunakan Dictionary Lookup dan Damerau-Levenshtein Distance,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 1, no. 6, pp. 498–506, 2017.

J. Jupin, J. Y. Shi, and Z. Obradovic, “Understanding Cloud Data Using Approximate String Matching and Edit Distance,” in 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012, pp. 1234–1243.

A. Pahdi, “Koreksi Ejaan Istilah Komputer Berbasis Kombinasi Algoritma Damerau- Levenshtein dan Algoritma Soundex,” Sentra Penelit. Eng. dan Edukasi, vol. 8, no. 2, pp. 1–8, 2016.

S. Nafisah, “Proses Fonologis dan Pengkaidahannya dalam Kajian Fonologi Generatif,” DEIKSIS, vol. 9, no. 01, p. 70, Jan. 2017.

M. S. Utomo, “Implementasi Stemmer Tala pada Aplikasi Berbasis Web,” J. Teknol. Inf. Din., vol. 18, no. 1, pp. 41–45, 2013.

R. Mandala, E. Koryanti, R. Munir, and H. Harlili, “Sistem Stemming Otomatis untuk Kata dalam Bahasa Indonesia,” in Seminar Nasional Aplikasi Teknologi Informasi 2004, 2004, pp. 29–36.

K. Mena, Vera Veti & Saputri, “Prefixes and Suffixes in the Descriptive Texts of Student ’ S,” English COmmunity J., vol. 2, pp. 175–182, 2018.




DOI: https://doi.org/10.15408/jti.v14i1.15010 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 Aji Prasetya Wibawa, Muhammad Nu’man Hakim

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@uinjkt.ac.id


Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.

 

JTI Visitor Counter: View JTI Stats

 Flag Counter