Web Traffic Anomaly Detection using Stacked Long Short-Term Memory

Fathu Rahman, Taufik Edy Sutanto, Nina Fitriyati

Abstract


Abstract

An example of anomaly detection is detecting behavioral deviations in internet use. This behavior can be seen from web traffic, which is the amount of data sent and received by people who visit websites. In this study, anomaly detection was carried out using stacked Long Short-Term Memory (LSTM). First, stacked LSTM is used to create forecasting models using training data. Then the error value generated from the prediction on test data is used to perform anomaly detection. We conduct hyperparameter optimization on sliding window parameter. Sliding window is a sub-sequential data of time-series data used as input in the prediction model. The case study was conducted on the real Yahoo Webscope S5 web traffic dataset, consisting of 67 datasets, each of which has three features, namely timestamp, value, and anomaly label. The result shows that the average sensitivity is 0.834 and the average Area Under ROC Curve (AUC) is 0.931. In addition, for some of the data used, the window size selection can affect the sum of the sensitivity and AUC values. In this study, anomaly detection using stacked LSTM is described in detail and can be used for anomaly detection in other similar problems.

Keywords: time-series data; sliding window; web traffic; window size.

 

Abstrak

Salah satu contoh deteksi anomali adalah mendeteksi penyimpangan perilaku dalam penggunaan internet. Perilaku ini dapat dilihat dari web traffic, yaitu jumlah data yang dikirim dan diterima oleh orang-orang yang mengunjungi situs web. Pada penelitian ini, deteksi anomali dilakukan menggunakan Long Short-Term Mermory (LSTM) bertumpuk. Pertama, LSTM bertumpuk digunakan untuk membuat model peramalan menggunakan data latih. Kemudian nilai error yang dihasilkan dari prediksi pada data uji digunakan untuk melakukan deteksi anomali. Kami melakukan optimasi hyperparameter pada parameter sliding window. Sliding window adalah data
sub-sekuensial dari data runtun waktu yang digunakan sebagai
input pada model prediksi. Studi kasus dilakukan pada dataset web traffic Yahoo Webscope S5 yang terdiri dari 67 dataset yang masing-masing memiliki tiga fitur yaitu timestamp, value, dan anomaly label. Hasil menunjukkan bahwa rata-rata sensitivitas sebesar 0.834 dan rata-rata Area Under ROC Curve (AUC) sebesar 0.931. Selain itu, untuk beberapa data yang digunakan, pemilihan window size dapat mempengaruhi jumlah dari nilai sensitivitas dan AUC. Pada penelitian ini, deteksi anomali menggunakan LSTM bertumpuk dijelaskan secara rinci dan dapat digunakan untuk deteksi anomali pada permasalahan lainnya yang serupa.

Kata kunci: data runtun waktu; sliding window; web traffic; window size.


Keywords


time-series data; sliding window; web traffic; window size.

References


K. H. Kim and S. B. Cho, “Modular bayesian networks with low-power wearable sensors for recognizing eating activities,” Sensors (Switzerland), vol. 17, no. 12, 2017, doi: 10.3390/s17122877.

C. A. Ronao and S. B. Cho, “Human activity recognition with smartphone sensors using deep learning neural networks,” Expert Syst. Appl., vol. 59, pp. 235–244, 2016, doi: 10.1016/j.eswa.2016.04.032.

S. Y. Huang and Y. N. Huang, “Network traffic anomaly detection based on growing hierarchical SOM,” in Proceedings of the International Conference on Dependable Systems and Networks, 2013, doi: 10.1109/DSN.2013.6575338.

M. Ahmed, A. Naser Mahmood, and J. Hu, “A survey of network anomaly detection techniques,” Journal of Network and Computer Applications, vol. 60. pp. 19–31, 2016, doi: 10.1016/j.jnca.2015.11.016.

C. A. Ronao and S. B. Cho, “Anomalous query access detection in RBAC-administered databases with random forest and PCA,” Inf. Sci. (Ny)., vol. 369, pp. 238–250, 2016, doi: 10.1016/j.ins.2016.06.038.

P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and E. Vázquez, “Anomaly-based network intrusion detection: Techniques, systems and challenges,” Comput. Secur., 2009, doi: 10.1016/j.cose.2008.08.003.

C. Torrano-Gimenez, A. Perez-Villegas, and G. Alvarez, “An Anomaly-Based Approach for Intrusion Detection in Web Traffic,” J. Inf. Assur. Secur., 2010.

B. J. Radford, L. M. Apolonio, A. J. Trias, and J. A. Simpson, “Network traffic anomaly detection using recurrent neural networks,” arXiv. 2018.

T. Y. Kim and S. B. Cho, “Web traffic anomaly detection using C-LSTM neural networks,” Expert Syst. Appl., vol. 106, pp. 66–76, 2018, doi: 10.1016/j.eswa.2018.04.004.

M. Braei and S. Wagner, “Anomaly detection in univariate time-series: A survey on the state-of-the-art,” arXiv. 2020.

H. D. Nguyen, K. P. Tran, S. Thomassey, and M. Hamad, “Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management,” Int. J. Inf. Manage., vol. 57, 2021, doi: 10.1016/j.ijinfomgt.2020.102282.

D. Singh et al., “Human activity recognition using recurrent neural networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10410 LNCS, pp. 267–274, doi: 10.1007/978-3-319-66808-6_18.

C. C. Aggarwal, Neural Networks and Deep Learning. Springer International Publishing, 2018.

P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long Short Term Memory networks for anomaly detection in time series,” in 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015 - Proceedings, 2015.

E. B. Martín Abadi, Ashish Agarwal, Paul Barham et al., “TensorFlow: Large-scale machine learning on heterogeneous systems.” 2015, [Online]. Available: tensorflow.org.

F. Chollet, “keras.” GitHub, 2015, [Online]. Available: https://github.com/fchollet/keras%7D%7D.


Full Text: PDF

DOI: 10.15408/inprime.v3i2.21879

Refbacks

  • There are currently no refbacks.