AN OPTIMIZED SUPPORT VECTOR MACHINE WITH GENETIC ALGORITHM FOR IMBALANCED DATA CLASSIFICATION

Authors

DOI:

https://doi.org/10.11113/jurnalteknologi.v85.19695

Keywords:

Machine learning, data classification, sampling method, support vector machine, genetic algorithm

Abstract

In supervised machine learning, class imbalance is commonly occurring when the number of examples that represent one class is much lower than other classes. Since an imbalance data may generate suboptimal classification models, it could lead to the minority examples are misclassified frequently and hardly achieving the best performance. This study proposes an improved support vector machine (SVM) method for imbalanced data namely as SVM-GA by optimizing SVM algorithm with Genetic Algorithm (GA) over a synthetic minority oversampling technique. Besides considering the best sampling method in optimized SVM, the experimental result shows that the proposed method improves by 97% compared to the baseline model and selected optimized models. The proposed model had significant performance by outperformed the baseline model and other models based SVM with Grid search and Randomized search in most of the cases, especially for the datasets which have extremely rare cases.  

References

Gautheron, L., Habrard, A., Morvant, E., & Sebban, M. 2019. Metric Learning from Imbalanced Data. 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). 923-930. https://doi.org/10.1109/ICTAI.2019.00131.

S. Maheshwari. 2017. A Review on Class Imbalance Problem: Analysis and Potential Solutions. International Journal of Computer Science Issues. 14(6): 43-51. https://doi:10.20943/01201706.4351.

Lee, Han Kyu and Seoung Bum Kim. 2018. An Overlap-sensitive Margin Classifier for Imbalanced and Overlapping Data. Expert Systems with Applications. 98: 72-83. https://doi.org/10.1016/j.eswa.2018.01.008.

A. Fernández, S. García, M. Galar, R.C. Prati, B. Krawczyk, F. Herrera. 2018. Learning from Imbalanced Data Sets. 10: 978-3. https://doi.org/10.1007/978-3-319-98074-4.

J. Zheng. 2010. Cost-sensitive Boosting Neural Networks for Software Defect Prediction. Expert Systems with Applications. 37(6): 4537-4543. https://doi.org/10.1016/j.eswa.2009.12.056.

V. López, A. Fernández, S. García, V. Palade, F. Herrera. 2013. An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on using Data Intrinsic Characteristics. Info. Sciences. 250: 113-141. https://doi.org/10.1016/j.ins.2013.07.007.

G. M. Weiss, Y. Tian. 2008. Maximizing Classifier Utility when There are Data Acquisition and Modeling Costs. Data Mining and Knowledge Discovery. 17(2): 253-282. https://doi.org/10.1007/s10618-007-0082-x.

N. Rout, D. Mishra, M. K. Mallick. 2018. Handling Imbalanced Data: A Survey. Advances in Intelligent Systems and Computing. 628: 431-443. https://doi:10.1007/978-981-10-5272-9_39.

A. H. Khandoker, M. Palaniswami, C. K. Karmakar. 2009. Support Vector Machines for Automated Recognition of Obstructive Sleep Apnea Syndrome from ECG Recordings. IEEE Trans. on Info. Techn. in Biomedicine. 13(1): 37-48. https://doi.org/10.1109/TITB.2008.2004495.

Y. Kim, H. Ling. 2009. Human Activity Classification based on Micro-doppler Signatures using a Support Vector Machine. IEEE Trans. on Geoscience and Remote Sensing. 47(5): 1328-1337. https://doi.org/10.1109/TGRS.2009.2012849.

Q. Jin, K. Guo, Y. Sun. 2017. Stock Price Forecasting using Support Vector Regression: Based on Network Behavior Data. Proceedings, IEEE Int. Conf. on Big Data, Big Data. 4148-4153. https://doi.org/10.1109/BigData.2017.8258436.

N. Sapankevych, R. Sankar. 2009. Time Series Prediction using Support Vector Machines: A Survey. IEEE Comp. Int. Magazine. 4(2): 24-38. https://doi.org/10.1109/MCI.2009.932254.

Y. Liu, X. Wang, L. Li, S. Cheng, Z. Chen. 2019. A Novel Lane Change Decision-Making Model of Autonomous Vehicle Based on Support Vector Machine. IEEE Access. 7: 26543-26550. https://doi.org/10.1109/ACCESS.2019.2900416.

Jing, O. 2020. Research on English Text Information Filtering Algorithm Based on SVM. 2020 IEEE Int. Con. on Power, Intelligent Computing and Systems. 1001-1004. https://doi.org/10.1109/ICPICS50287.2020.9202016.

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, A. Lopez. 2020. A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends. Neurocomputing. 408: 189-215. https://doi.org/10.1016/j.neucom.2019.10.118.

V. Vapnik. 1999. The Nature of Statistical Learning Theory. Springer Science & Business Media.

T. K. Bhowmik, P. Ghanty, A. Roy, S. K. Parui. 2009. SVM-based Hierarchical Architectures for Handwritten Bangla Character Recognition. International Journal on Document Analysis and Recognition (IJDAR). 12(2): 97-108. https://doi.org/10.1007/s10032-009-0084-x.

J. Cervantes, F. Garcia-Lamont, A. López-Chau, L. Rodríguez-Mazahua, J. S. Ruíz. 2015. Data Selection based on Decision Tree for SVM Classification on Large Data Sets. Applied Soft Computing. 37: 787-798. https://doi.org/10.1016/j.asoc.2015.08.048.

M. Dudjak, G. Martinović. 2021. An Empirical Study of Data Intrinsic Characteristics that Make Learning from Imbalanced Data Difficult. Expert Systems with Applications. 182: 115297. https://doi.org/10.1016/j.eswa.2021.115297.

Y. Tian, Q. Zhang, D. Liu. 2014. ν-Nonparallel Support Vector Machine for Pattern Classification. Neural Computing and App. 25(5): 1007-1020. https://doi.org/10.1007/s00521-014-1575-3.

X. Yan, K. An, C. X. Wang, W. P. Zhu, Y. Li, Z. Feng. 2020. Genetic Algorithm Optimized Support Vector Machine in NOMA-based Satellite Networks with Imperfect CSI. IEEE Int. Con. on Acoustics, Speech and Signal Processing. 8817-8821. https://doi.org/10.1109/ICASSP40776.2020.9053003.

N. A. Abdullah, M. A. Ibrahim, A. S. Haider. 2020. GA as a Key Parameter of SVM Parameter Optimization and Feature Selection for Acute Leukemia Diagnosis Genetic Algorithm as a Key Parameter of SVM Parameter Optimization and Feature Selection for Acute Leukemia diagnosis. University of Aden Journal of Natural and Applied Sciences. 24(2): 385-393. https://doi.org/10.47372/uajnas.2020.n2.a07.

M. Yao, G. Fu, T. Chen, M. Liu, J. Xu, H. Zhou, X. He, L. Huang. 2021. A Modified Genetic Algorithm Optimized SVM for Rapid Classification of Tea Leaves using Laser-induced Breakdown Spectroscopy. J. of Analy. Atomic Spectrometry. 36(2): 361-367. https://doi.org/10.1039/d0ja00317d.

S. Katoch, S. Singh Chauhan, V. Kumar. 2021. A Review on Genetic Algorithm: Past, Present, and Future. Multimedia Tools and Applications. 80: 8091-8126.

R. Klempka, B. Filipowicz. 2017. Comparison of using the Genetic Algorithm and Cuckoo Search for Multicriteria Optimisation with Limitation. Turkish J. of Elect. Eng. and Comp. Sci. 25(2): 1300-1310. https://doi.org/10.3906/elk-1511-252.

I. Syarif, A. Prugel-Bennett, G. Wills. 2016. SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance. TELKOMNIKA (Telecommunication Computing Electronics and Control). 14(4): 1502. https://doi.org/10.12928/telkomnika.v14i4.3956.

K. Nath Das. 2014. Hybrid Genetic Algorithm: An Optimization Tool. Global Trends in Intelligent Computing Research and Development. 268-305. https://doi.org/10.4018/978-1-4666-4936-1.ch010.

N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16): 321-357. https://doi.org/10.1613/jair.953.

H. Han, W. Y. Wang, B. H. Mao. 2005. Borderline-SMOTE: A New Over-sampling Method in Imbalanced Data Sets Learning. Lecture Notes in Comp. Sci. 3644: 878-887. https://doi.org/10.1007/11538059_91.

H. He, Y. Bai, E. Garcia, S. Li. 2008. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322-1328. https://doi: 10.1109/IJCNN.2008.4633969.

G. E. A. P. A. Batista, R. C. Prati, M. C. Monard. 2004. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explorations Newsletter. 6(1): 20-29. https://doi.org/10.1145/1007730.1007735.

G. P. Wang, J. X. Yang, R. Li. 2017. Imbalanced SVM-based Anomaly Detection Algorithm for Imbalanced Training Datasets. ETRI Journal. 39(5): 621-631. https://doi.org/10.4218/etrij.17.0116.0879.

J. Alcalá-Fdez, L. Sánchez, S. Garciá, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fernández, F. Herrera. 2009. KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems. Soft Computing. 13(3): 307-318. https://doi.org/10.1007/s00500-008-0323-y.

S. Szeghalmy, A. Fazekas. 2023. A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors. 23(4): 2333. https://doi.org/10.3390/s23042333.

A. Hassanat, K. Almohammadi, E. Alkafaween, E. Abunawas, A. Hammouri, V. B. Surya Prasath. 2019. Choosing Mutation and Crossover Ratios for Genetic Algorithms—A Review with a New Dynamic Approach. Information. 10(12): 390. https://doi.org/10.3390/info10120390.

T. Tarkowski. 2022. Genetic Algorithm Formulation and Tuning with Use of Test Functions. arXiv preprint arXiv:2210.03217. https://doi.org/10.48550/arXiv.2210.03217.

N. M. Razali, J. Gerathy. 2011. Genetic Algorithm Performance with Different Selection Strategies in Solving TSP. Proceedings of the World Congress on Engineering. 2(1): 1-6. Hong Kong, China: International Association of Engineers.

M. Lynch. 2010. Evolution of the Mutation Rate. Trends in Genetics. 26(8): 345-352. https://doi.org/10.1016/j.tig.2010.05.003.

N. Brouwer, D. Dijkzeul, L. Koppenhol, I. Pijning, D. Van den Berg. 2022. Survivor Selection in a Crossoverless Evolutionary Algorithm. Proceedings of the Genetic and Evolutionary Computation Conference Companion. 1631-1639. https://doi.org/10.1145/3520304.3533950.

A. Pétrowski, S. Ben-Hamida. 2017. Evolutionary Algorithms. John Wiley & Sons. 7. https://doi.org/10.1002/9781119136378.

M. Safe, J. Carballido, I. Ponzoni, N. Brignole. 2004. On Stopping Criteria for Genetic Algorithms. Advances in Artificial Intelligence-SBIA 2004: 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, Springer. 405-413. https://doi.org/10.1007/978-3-540-28645-5_41.

J. Bergstra, Y. Bengio. 2012. Random Search for Hyper-parameter Optimization. J. Mach. Learn. Res. 13: 281-305. https://dl.acm.org/doi/10.5555/2188385.2188395.

Downloads

Published

2023-06-25

Issue

Section

Science and Engineering

How to Cite

AN OPTIMIZED SUPPORT VECTOR MACHINE WITH GENETIC ALGORITHM FOR IMBALANCED DATA CLASSIFICATION. (2023). Jurnal Teknologi, 85(4), 67-74. https://doi.org/10.11113/jurnalteknologi.v85.19695