AN EMPIRICAL ANALYSIS OF FEATURE ENGINEERING TECHNIQUES TO REDUCE DIMENSIONALITY FOR NON-BINARY CLASSIFICATION PROBLEMS - A CASE STUDY WITH FOETAL HEALTH DATASET

Authors

  • Sandhya Soman GITAM(Deemed-to-be) University, School of Science, Department of Computer Science, Bengaluru, India
  • Adeitia Kalyann Boniface Indian Institute of Management, Vishakhapatnam, Andhra Pradesh, India.
  • Agnes Lydia AI/ML Associate Consultant, Sustainable Living Lab, Chennai, India

DOI:

https://doi.org/10.11113/aej.v15.22894

Keywords:

feature selection, feature subspace, Dimensionality Reduction, Convergence Time , Feature Reduction

Abstract

In today’s world, when AI and ML are deeply involved in our day-to-day lives, merely designing a machine learning model is insufficient. The complexity involved in training the model is vital in determining whether such systems would be deployed in real environments. Today, ML engineers strive to accomplish this, as the models that work well in academic research fail to work well in production. The ML code is a small segment of the ML infrastructure. While in Academia, the focus is on code and hyperparameters, the Industrial Product Team’s focus is on data.   Data engineering and feature engineering, often ignored during model creation and deployment, are two techniques to bridge this gap. To emphasize its importance, we have considered a non-binary classification problem - the Foetal Health Classification problem. We have applied different feature engineering techniques to reduce the number of significant features required for Model Training and have determined the best possible FE technique. From a set of 21 independent features, we could lower the feature count to nine and retain the accuracy score compared to training using the complete feature set. This paper showcases the performance of different prediction models on the dataset, selecting the best prediction model and applying feature engineering techniques for dimensionality reduction. Keeping the threshold at 0.025, we could achieve 96% accuracy, 92.9% precision score, 94.5% recall value, 93.7% F-score, and a dimensionality reduction of 29%. Maintaining a threshold of 0.013, a 95.1% accuracy, 91.3% precision value, 94.5% recall, and 92.8% F-score, and a dimensionality reduction of 57% could be achieved. The above indicates that equivalent results can be achieved with a subset of the Feature set, which can be further instrumental in reducing the model training and convergence time.

References

Mehbodniya, A., Lazar, A.J.P., Webber, J., Sharma, D.K., Jayagopalan, S., K, K., Singh, P., Rajan, R., Pandya, S. and Sengan, S., 2022. Fetal health classification from cardiotocographic data using machine learning. Expert Systems, 39(6): e12899. DOI: https://doi.org/10.1111/exsy.12899

Rawat, T. and Khemchandani, V., 2017. Feature engineering (FE) tools and techniques for better classification performance. International Journal of Innovative Engineering and Technology. 8(2): 169-179. DOI: https://doi.org/10.21172/ijiet.82.024

Karmarkar, A., Altay, A., Zaks, A., Polyzotis, N., Ramesh, A., Mathes, B., Vasudevan, G., Giannoumis, I., Wilkiewicz, J., Simsa, J. and Hong, J., 2020. Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX). arXiv preprint arXiv:2010.02013. DOI: https://doi.org/10.48550/arXiv.2010.02013

Uddin, M.F., Lee, J., Rizvi, S. and Hamada, S., 2018. Proposing enhanced feature engineering and a selection model for machine learning processes. Applied Sciences, 8(4): 646. DOI: https://doi.org/10.3390/app8040646

Verdonck, T., Baesens, B. and Oskarsdottir, M., 2021. Special Issue on Advances in Feature Engineering editorial. Machine Learning. 113(7): 3917–3928. DOI: https://doi.org/10.1007/s10994-021-06042-2

Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J. and Liu, H., 2017. Feature selection: A data perspective. ACM computing surveys (CSUR), 50(6): 1-45. DOI: https://doi.org/10.48550/arXiv.1601.07996

Morán-Fernández, L. and Bolón-Canedo, V., 2021. Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?. In International Work-Conference on Artificial Neural Networks. 113-125. DOI: https://doi.org/10.1007/978-3-030-85030-2_10. Cham: Springer International Publishing.

Ayres-de-Campos, D., Bernardes, J., Garrido, A., Marques-de-Sa, J. and Pereira-Leite, L., 2000. SisPorto 2.0: a program for automated analysis of cardiotocograms. Journal of Maternal-Fetal Medicine, 9(5): 311-318. DOI: https://doi.org/10.1002/1520-6661(200009/10)9:5

Sundar, C., Chitradevi, M. and Geetharamani, G., 2012. Classification of cardiotocogram data using neural network based machine learning technique. International Journal of Computer Applications, 47(14): 19–25. DOI: https://doi.org/10.5120/7256-0279

Fasihi, M., Nadimi-Shahraki, M.H. and Jannesari, A., 2021. A shallow 1-D convolution neural network for fetal state assessment based on cardiotocogram. SN Computer Science, 2(4): 287. DOI: https://doi.org/10.1007/s42979-021-00694-6

Garcia-Canadilla, P., Sanchez-Martinez, S., Crispi, F. and Bijnens, B., 2020. Machine learning in fetal cardiology: what to expect. Fetal Diagnosis And Therapy, 47(5): 363-372.DOI: https://doi.org/10.1159/000505021

Mohannad, A., Shibata, C., Miyata, K., Imamura, T., Miyamoto, S., Fukunishi, H. and Kameda, H., 2021. Predicting high risk birth from real large-scale cardiotocographic data using multi-input convolutional neural networks. Nonlinear Theory and its Applications, IEICE, 12(3): 399-411. DOI: https://doi.org/10.1587/nolta.12.399

Guyon, I. and Elisseeff, A., 2003. An introduction to variable and feature selection. Journal Of Machine Learning Research, 3(Mar): 1157-1182. DOI: https://doi.org/10.5555/944919.944968

Sahin, H. and Subasi, A., 2015. Classification of the cardiotocogram data for anticipation of fetal risks using machine learning techniques. Applied Soft Computing, 33: 231-238. DOI: https://doi.org/10.1016/j.asoc.2015.04.038

Pudjihartono, N., Fadason, T., Kempa-Liehr, A.W. and O'Sullivan, J.M., 2022. A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics, 2: 927312. DOI: 10.3389/fbinf.2022.927312

Ng, A., Crowe, R., Moroney, L., & Arámburu, C. B. (2023). Machine learning engineering for production (MLOps) specialization [Online course]. Coursera. https://www.coursera.org/learn/introduction-to-machine-learning-in-production

Heaton, J., 2016, March. An empirical analysis of feature engineering for predictive modeling. In Proceedings of the IEEE Southeast Conference 2016. 1-6. IEEE. DOI: 10.1109/SECON.2016.7506650.

Awad, M. and Fraihat, S., 2023. Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. Journal of Sensor and Actuator Networks, 12(5): 67. DOI: https://doi.org/10.3390/jsan12050067

Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y. and Tao, D., 2017. Cost-sensitive feature selection by optimizing F-measures. IEEE Transactions on Image Processing, 27(3): 1323-1335. DOI: https://doi.org/10.1109/tip.2017.2781298

Abiyev, R., Idoko, J. B., Altıparmak, H., & Tüzünkan, M. 2023. Fetal health state detection using interval type-2 fuzzy neural networks. Diagnostics, 13(10): 1690. DOI: https://doi.org/10.3390/diagnostics13101690

Jebadurai, I., Paulraj, G., Jebadurai, J., & Silas, S. 2022. Experimental analysis of filtering-based feature selection techniques for fetal health classification. Serbian Journal of Electrical Engineering, 19(2): 207–224. DOI: https://doi.org/10.2298/sjee2202207j

Regmi, B., & Shah, C. 2024. Classification Methods Based on Machine Learning for the Analysis of Fetal Health Data. ArXiv.org. https://arxiv.org/abs/2311.10962

Downloads

Published

2025-08-31

Issue

Section

Articles

How to Cite

AN EMPIRICAL ANALYSIS OF FEATURE ENGINEERING TECHNIQUES TO REDUCE DIMENSIONALITY FOR NON-BINARY CLASSIFICATION PROBLEMS - A CASE STUDY WITH FOETAL HEALTH DATASET. (2025). ASEAN Engineering Journal, 15(3), 17-24. https://doi.org/10.11113/aej.v15.22894