IMPROVING CLASSIFICATION ACCURACY FOR NON-COMMUNICABLE DISEASE PREDICTION MODEL BASED ON SUPPORT VECTOR MACHINE

Authors

  • Mohd. Khanapi Abd. Ghani Biomedical Computing and Engineering Technologies (BIOCORE) Applied Research Group, Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
  • Daniel Hartono Sutanto Biomedical Computing and Engineering Technologies (BIOCORE) Applied Research Group, Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia

DOI:

https://doi.org/10.11113/jt.v77.6487

Keywords:

Prediction, non-communicable disease, data mining, feature selection, classification, k-means, weight by SVM, support vector machine

Abstract

Over recent years, Non-communicable Disease (NCDs) is the high mortality rate in worldwide likely diabetes mellitus, cardiovascular diseases, liver and cancers. NCDs prediction model have problems such as redundant data, missing data, imbalance dataset and irrelevant attribute. This paper proposes a novel NCDs prediction model to improve accuracy. Our model comprisesk-means as clustering technique, Weight by SVM as feature selection technique and Support Vector Machine as classifier technique. The result shows that k-means + weight SVM + SVM improved the classification accuracy on most of all NCDs dataset (accuracy; AUC), likely Pima Indian Dataset (99.52; 0.999), Breast Cancer Diagnosis Dataset (98.85; 1.000), Breast Cancer Biopsy Dataset (97.71; 0.998), Colon Cancer (99.41; 1.000), ECG (98.33; 1.000), Liver Disorder (99.13; 0.998).The significant different performed by k-means + weight by SVM + SVM. In the time to come, we are expecting to better accuracy rate with another classifier such as Neural Network.

References

WHO. 2010. Global Status Report on Noncommunicable Diseases.

M. K. A. Ghani, R. K. Bali, R. N. G. Naguib, I. M. Marshall, and N. S. Wickramasinghe. 2010. Critical Analysis of the Usage of Patient Demographic and Clinical Records During Doctor-Patient Consultations: A Malaysian Perspective. Int. J. Healthc. Technol. Manag. 11(1/2): 113.

D. H. Sutanto, N. S. Herman, and M. K. A. Ghani. 2014. Trend of Case Based Reasoning in Diagnosing Chronic Disease: A Review. Adv. Sci. Lett. 20(10): 1740-1744.

M. K. A. Ghani, R. K. Bali, R. N. G. Naguib, I. M. Marshall, and N. S. Wickramasinghe. 2008. Electronic Health Records Approaches and Challenges: A Comparison Between Malaysia and four East Asian countries. Int. J. Electron. Healthc. 4(1): 78.

I. Guyon. 2003. An Introduction to Variable and Feature Selection 1 Introduction. J. Mach. Learn. Res. 3: 1157-1182.

V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos. 2013. A Review of Feature Selection Methods on Synthetic Data. Knowl. Inf. Syst. 34(3): 483-519.

B. M. Patil, R. C. Joshi, and D. Toshniwal. 2010. Hybrid Prediction Model for Type-2 Diabetic Patients. Expert Syst. Appl. 37(12): 8102-8108.

E. Gürbüz and E. Kılıç. 2014. A New Adaptive Support Vector Machine for Diagnosis of Diseases. Expert Syst. 31(5): 389-397.

R. C. Anirudha, R. Kannan, and N. Patil. 2015. Genetic Algorithm Based Wrapper Feature Selection on Hybrid Prediction Model for Analysis of High Dimensional Data.

L.-Y. Chuang, C.-H. Yang, K.-C. Wu, and C.-H. Yang. 2011. A Hybrid Feature Selection Method for DNA Microarray Data. Comput. Biol. Med. 41(4): 228-37.

M. A. Chikh, M. Saidi, and N. Settouti. 2012. Diagnosis of diabetes Diseases Using An Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-Nearest Neighbor. J. Med. Syst. 36(5): 2721-9.

F. Beloufa and M. a Chikh. 2013. Design of Fuzzy Classifier for Diabetes Disease Using Modified Artificial Bee Colony Algorithm. Comput. Methods Programs Biomed. 112(1): 92-103.

J. Zhu, Q. Xie, and K. Zheng. 2015. An Improved Early Detefeaction Method of Type-2 Diabetes Mellitus Using Multiple Classifier System. Inf. Sci. (Ny). 292: 1-14.

P. Luukka. 2011. Feature Selection Using Fuzzy Entropy Measures with Similarity Classifier. Expert Syst. Appl. 38(4): 4600-4607.

N. Yilmaz, O. Inan, and M. S. Uzer. 2014. A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases. J. Med. Syst. 38(5): 48.

S. Belciug and F. Gorunescu. 2014. Error-correction Learning for Artificial Neural Networks Using the Bayesian Paradigm. Application to Automated Medical Diagnosis. J. Biomed. Inform. 52: 329-37.

D.-C. Li, C.-W. Liu, and S. C. Hu. 2011. A Fuzzy-based Data Transformation for Feature Extraction to Increase Classification Performance with Small Medical Data Sets. Artif. Intell. Med. 52(1): 45-52.

Y. J. Fan and W. A. Chaovalitwongse. 2010. Optimizing Feature Selection to Improve Medical Diagnosis. Ann. Oper. Res. 174: 169-183.

P. Ganesh Kumar, T. Aruldoss Albert Victoire, P. Renukadevi, and D. Devaraj. 2012. Design of Fuzzy Expert System for Microarray Data Classification Using a Novel Genetic Swarm Algorithm. Expert Syst. Appl. 39(2): 1811-1821.

V. Sigillito. 1990. Pima Indians Diabetes Database. UCI Machine Learning Repository, National Institute of Diabetes and Digestive and Kidney Diseases.

W. H. Wolberg, W. N. Street, and O. L. Mangasarian. 1992. Breast Cancer Wisconsin (Diagnostic) Data Set. UCI Machine Learning Repository, University of Wisconsin Hospitals Madison, Wisconsin, USA.

S. Salzberg and Evlin Kinney. 1988. Echocardiogram Data Set. UCI Machine Learning Repository, The Reed Institute, Miami.

J. a. Laurie, C. G. Moertel, T. R. Fleming, H. S. Wieand, J. E. Leigh, J. Rubin, G. W. McCormack, J. B. Gerstner, J. E. Krook, J. Malliard, D. I. Twito, R. F. Morton, L. K. Tschetter, and J. F. Barlow. 1989. Surgical Adjuvant Therapy of Large-Bowel Carcinoma: An Evaluation of Levamisole and Their Combination of Levamisole and Fluorouracil. J. Clin. Oncol. 7(10): 1447-1456.

J. B. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. 281-297.

G. Chandrashekar and F. Sahin. 2014. A Survey on Feature Selection Methods. Comput. Electr. Eng. 40(1): 16-28.

V. Vapnik, S. E. Golowich, and A. Smola. 1998. Support Vector Method for Function Approximation. Regression Estimation, and Signal Processing. 281-287.

R. Duda O., P. Hart E., and D. Stork G. 2000. Pattern Classification.

M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Ares, and D. Haussler. 2000. Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. Proc. Natl. Acad. Sci. U. S. A. 97(1): 262-267.

N. Cristiani and J. Shawe-Taylor. 2000.An Introduction to Support Vector Machines.

ian H. Witten, E. Frank, and M. A. Hall. 2006. Data Mining : Practical Machine Learning Tools and Techniques. 3rd edition.

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. 2008. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings,†IEEE Trans. Softw. Eng. 34(4): 485-496.

V. Van Belle and P. Lisboa. 2014. White Box Radial Basis Function Classifiers with Component Selection for Clinical Prediction Models. Artif. Intell. Med. 60(1): 53-64.

M. Hofmann and R. Klinkenberg. 2013. RapidMiner: Data Mining Use Cases and Business Analytics Applications. CRC Press,

J. Han, J. C. Rodriguez, and M. Beheshti. 2008. Diabetes Data Analysis and Prediction Model Discovery Using RapidMiner. 2008 Second Int. Conf. Futur. Gener. Commun. Netw. 96-99.

T. Fahmy and A. Aubry. 1998. XLstat. In Société Addinsoft SARL. 40.

M. Seera and C. P. Lim. 2014. A Hybrid Intelligent System for Medical Data Classification. Expert Syst. Appl. 41(5): 2239-2249.

M. Abedini and M. Kirley. 2013. An Enhanced XCS Rule Discovery Module Using Feature Ranking. Int. J. Mach. Learn. Cybern. 4(3): 173-187.

J. H. and M. Kamber. 2006. Data Mining Concepts and Techniques.

Downloads

Published

2015-11-26

How to Cite

IMPROVING CLASSIFICATION ACCURACY FOR NON-COMMUNICABLE DISEASE PREDICTION MODEL BASED ON SUPPORT VECTOR MACHINE. (2015). Jurnal Teknologi, 77(18). https://doi.org/10.11113/jt.v77.6487