Feature Selection Technique Impact for Internet Traffic Classification Using Naïve Bayesian

Authors

  • Tony Antonio Information Technology Department, University of Ciputra, UC Town, Citraland, Surabaya, Indonesia
  • Adi Suryaputra Paramita Information Technology Department, University of Ciputra, UC Town, Citraland, Surabaya, Indonesia

DOI:

https://doi.org/10.11113/jt.v72.4112

Keywords:

Feature, selection, classification, internet, traffic

Abstract

Feature selection technique has an important role for internet traffic classification. This technique will present more accurate data and more accurate internet traffic classification which will provide precise information for bandwidth optimization. One of the important considerations in the feature selection technique that should be looked into is how to choose the right features which can deliver better and more precise results for the classification process. This research will compare feature selection algorithms where the Internet traffic has the same correlation that could fit into the same class. Internet traffic dataset will be collected, formatted, classified and analyzed using Naïve Bayesian. Formerly, the Correlation Feature Selection (CFS) is used in the feature selection to find a collection of the best sub-sets data from the existing data but without the discriminant and principal of a body dataset. We plan to use Principal Component Analysis technique in order to find discriminant and principal feature for internet traffic classification. Moreover, this paper also studied the process to fit the features. The result also shows that the internet traffic classification using Naïve Bayesian and Correlation Feature Selection (CFS) have more than 90% accuracy while the classification accuracy reached 75% for feature selection using Principal Component Analysis (PCA).

References

Mohd, A. B. 2009. Towards a Flow-based Internet Traffic Classification for Bandwidth Optimization. International Journal of Computer Science and Security. 3(2): 146–153.

Wang, X., Abraham, A., & Smith, K. 2005. Intelligent web traffic mining and analysis. Journal of Network and Computer Applications. 28(2): 147–165. doi:10.1016/j.jnca.2004.01.006.

Zhao, J., Huang, X., Sun, Q., & Ma, Y. 2008. Real-time feature Selection in Traffic Classification. The Journal of China Universities of Posts and Telecommunications. 15(S): 68–72. doi:10.1016/S1005-8885(08)60158-2.

Moore, A., Zuev, D., & Crogan, M. 2005. Discriminators for Use in Flow-based Classification. Queen Mary, University of London.

Budayan, C., Dikmen, I., & Birgonul, M. T. 2009. Comparing the Performance Of Traditional Cluster Analysis, Self-Organizing Maps and Fuzzy C-means Method for Strategic Grouping. Expert Systems with Applications. 36(9): 11772–11781. doi:10.1016/j.eswa.2009.04.022.

Erman, J., Mahanti, A., Arlitt, M., Cohen, I., & Williamson, C. 2007. Offline/realtime Traffic Classification Using Semi-supervised Learning. Performance Evaluation. 64(9–12): 1194–1213. doi:10.1016/j.peva.2007.06.014.

Gu, C., Zhang, S., & Xue, X. 2011. Internet Traffic Classification based on Fuzzy Kernel K-means Clustering 3. Internet Traffic Classification based on Fuzzy Kernel K-means Clustering. International Journal of Advancements in Computing Technology. 3(3): 199–209. doi:10.4156/ijact.vol3.

Fahad, A., Tari, Z., Khalil, I., Habib, I., & Alnuweiri, H. 2013. Toward an Efficient and Scalable Feature Selection Approach for Internet Traffic Classification. Computer Networks. 57(9): 2040–2057. doi:10.1016/j.comnet.2013.04.005.

Lee, Y. H., Wei, C. P., Cheng, T. H., & Yang, C. T. 2012. Nearest-neighbor-based Approach to Time-series Classification. Decision Support Systems. 53(1): 207–217. doi:10.1016/j.dss.2011.12.014.

Lin, G., Xin, Y., Niu, X., & Jiang, H. 2010. Network Traffic Classification Based on Semi-supervised Clustering. The Journal of China Universities of Posts and Telecommunications. 17(December): 84–88. doi:10.1016/S1005-8885(09)60577-X.

Nguyen, T., & Armitage, G. 2008. A Survey of Techniques for Internet Traffic Classification Using Machine Learning. IEEE Communications Surveys & Tutorials. 10(4): 56–76. doi:10.1109/SURV.2008.080406.

Park, J., Tyan, H., & Kuo, C. 2006. Internet Traffic Classification for Scalable QOS Provision. 2006 IEEE International Conference on Multimedia and Expo. 1221–1224. doi:10.1109/ICME.2006.262757.

Sun, M., & Chen, J. 2011). Research of the Traffic Characteristics for the Real Time Online Traffic Classification. The Journal of China Universities of Posts and Telecommunications. 18(3): 92–98. doi:10.1016/S1005-8885(10)60069-6.

Sun, M., Chen, J., Zhang, Y., & Shi, S. 2012. A New Method of Feature Selection for Flow Classification. Physics Procedia. 24: 1729–1736. doi:10.1016/j.phpro.2012.02.255.

Vieira, S. M., Sousa, J. M. C., & Kaymak, U. 2012. Fuzzy Criteria for Feature Selection. Fuzzy Sets and Systems. 189(1): 1–18. doi:10.1016/j.fss.2011.09.009.

Wang, H., & Fei, B. 2009. A Modified Fuzzy C-means Classification Method Using a Multiscale Diffusion Filtering Scheme. Medical Image Analysis. 13(2): 193–202. doi:10.1016/j.media.2008.06.014.

Wang, Y., Xiang, Y., Zhang, J., Zhou, W., & Xie, B. 2014. Internet traffic Clustering with Side Information. Journal of Computer and System Sciences. 80(5): 1021–1036. doi:10.1016/j.jcss.2014.02.008.

Zhang, H., Lu, G., Qassrawi, M. T., Zhang, Y., & Yu, X. 2012. Feature Selection for Optimizing Traffic Classification. Computer Communications, 35(12), 1457–1471. doi:10.1016/j.comcom.2012.04.01.

Esbensen, K. H., 2009. Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background, Algorithms, History, Practice. Elsevier.

Karegowda, A. G., Manjunath, A. S., Ratio, G., & Evaluation, C. F. 2010. Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection. International Journal of Information Technology and Knowledge Management. 2(2): 271–277.

Downloads

Published

2015-01-01

How to Cite

Feature Selection Technique Impact for Internet Traffic Classification Using Naïve Bayesian. (2015). Jurnal Teknologi, 72(5). https://doi.org/10.11113/jt.v72.4112