Sequential Topic Modelling: A Case Study on Indonesian LGBT Conversation on Twitter

Arsy Arslina, Muhaza Liebenlito

Abstract


Abstract

As a country with the largest Muslim population in the world, the Lesbian, Gay, Bisexual, and Transgender (LGBT) issue in Indonesia has always been a hot topic to investigate. Social media such as Twitter is normally the main media where people normally discuss this LGBT topic. In this paper, we collect 18,552 tweets dated from 2015 up to 2018 to analyze the dynamics of the LGBT conversation among Indonesian peoples. In this research, we will explore the main topic of the LGBT conversation using Linear Discriminant Analysis (LDA). LDA is one of the most popular methods of soft clustering. This technique is effective to identify latent topic information (hidden) in a collection of big data using a bag of words approaches that treat every document as a vector of total words and is represented as a probability distribution on several topics. The result shows that there are seven main categories that people normally talked about regarding LGBT i.e. politics, religion, government, ethics, nationality, culture, and technology. Looking at the topic probability distributions on each semester we found that it is generally homogenous. An exception occurs during the government election period where politic tends to have a significantly higher probability. In other words, we have found that there is a tendency that LGBT issues are used in Indonesian politics.

Keywords: LGBT; politics; topic modeling; twitter.

 

Abstrak

Sebagai negara dengan penduduk muslim terbesar di dunia, isu mengenai Lesbian, Gay, Bisexual, dan Transgender (LGBT) di Indonesia adalah isu sensitif yang senantiasa menarik untuk diteliti. Media sosial seperti twitter adalah salah satu media yang biasa digunakan masyarakat untuk mendiskusikan tentang topik LGBT ini. Penelitian ini menggunakan 18.552 tweet tahun 2015 – 2018 dikumpulkan untuk melihat perbedaan pola perbincangan dari waktu ke waktu. Dalam penelitian ini, eksplorasi topik utama perbincangan LGBT dianalisis menggunakan metode Linear Discriminant Analysis (LDA). LDA adalah metode yang paling populer dalam soft clustering. Teknik ini efektif untuk mengidentifikasi informasi topik laten (tersembunyi) dalam koleksi dokumen besar menggunakan pendekatan bag of words yang memperlakukan setiap dokumen sebagai vektor jumlah kata dan direpresentasikan sebagai distribusi probabilitas atas beberapa topik, sementara setiap topik direpresentasikan sebagai distribusi probabilitas atas sejumlah kata. Hasil menunjukkan bahwa terdapat tujuh topik dominan yang sering muncul pada perbincangan tentang LGBT, yaitu politik, agama, pemerintahan, keasusilaan, kewarganegaraan, budaya dan teknologi. Pada kategori ini kemudian distribusi probabilitas topik dihitung dan dianalisa pada setiap semesternya. Hasilnya menunjukkan bahwa ada kecenderungan distribusi topik seragam, kecuali pada masa-masa pergantian pemerintahan dimana kategori politik cenderung meningkat secara signifikan. Dengan kata lain, ada kecenderungan bahwa isu LGBT dikaitkan dengan kehidupan perpolitikan di Indonesia.

Kata kunci: LGBT, politik, topic modelling, twitter.


References


D. J. P. and P. Penyakit, “Laporan Perkembangan HIV-AIDS & Infeksi Menular Seksual (IMS) Triwulan IV Tahun 2017,” Jakarta, 2017.

P. L. Pan, J. Meng, and S. Zhou, “Morality or equality? Ideological framing in news coverage of gay marriage legitimization,” Soc. Sci. J., vol. 47, no. 3, pp. 630–645, 2010.

H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a Social Network or a News Media?,” Int. World Wide Web Conf. Comm., pp. 1–10, 2010.

M. J. Paul and M. Dredze, “You are what you Tweet: Analyzing Twitter for public health,” Proc. Fifth Int. AAAI Conf. Weblogs Soc. Media, pp. 265–272, 2011.

W. X. Zhao et al., “Comparing Twitter and Traditional Media Using Topic Models,” Ecir, vol. 6611, pp. 338–349, 2011.

L. Hong and B. D. Davison, “Empirical study of topic modeling in Twitter,” Proc. First Work. Soc. Media Anal. - SOMA ’10, pp. 80–88, 2010.

D. M. Blei, “Probabilistic topic models (Lecture),” Commun. ACM, vol. 55, no. 4, p. 77, 2012.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” J. Am. Soc. Inf. Sci., vol. 41, no. 6, pp. 391–407, 1990.

N. E. Evangelopoulos, “Latent semantic analysis,” Wiley Interdisciplinary Reviews: Cognitive Science, vol. 4, no. 6. pp. 683–692, 2013.

N. Evangelopoulos, X. Zhang, and V. R. Prybutok, “Latent semantic analysis: Five methodological recommendations,” Eur. J. Inf. Syst., vol. 21, no. 1, pp. 70–86, 2012.

T. Hofmann and Dan Oneata, “Probabilistic latent semantic analysis,” UAI’99 Proc. Fifteenth Conf. Uncertain. Artif. Intell., pp. 1–7, 1999.

T. Hofmann, “Probabilistic Latent Semantic Indexing,” ACM SIGIR Forum, vol. 51, no. 2, pp. 211–218, 2017.

T. Hofmann, “Unsupervised learning by probabilistic Latent Semantic Analysis,” Mach. Learn., vol. 42, no. 1–2, pp. 177–196, 2001.

D. Lee and S. Seung, “Algorithms for Non-negative Matrix Factorization,” in Advances in Neural Information Processing Systems 13, 2001, pp. 556–562.

L. Li and Y. J. Zhang, “Non-negative Matrix-Set Factorization,” in Proceedings of the 4th International Conference on Image and Graphics, ICIG 2007, 2007, pp. 564–569.

D. M. Blei, B. B. Edu, A. Y. Ng, A. S. Edu, M. I. Jordan, and J. B. Edu, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.

X. Wang and A. McCallum, “Topics over time: A non-Markov continuous-time model of topical trends,” Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 424–433, 2006.

J. Supranto, “Statistik teori dan aplikasi jilid 1 / oleh J. Supranto,” Stat. Teor. dan Apl. jilid 1 / oleh J. Supranto, vol. 2000, no. 2000, pp. 1–99, 2000.

I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling, “Fast collapsed Gibbs sampling for latent Dirichlet allocation,” in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, 2008, p. 569.


Full Text: PDF

DOI: 10.15408/inprime.v1i1.12726

Refbacks

  • There are currently no refbacks.