A COMPARATIVE STUDY ON GENE SELECTION METHODS FOR TISSUES CLASSIFICATION ON LARGE SCALE GENE EXPRESSION DATA

Authors

  • Farzana Kabir Ahmad Computational Intelligence Research Cluster, School of Computing, College of Arts and Sciences, Universiti Utara Malaysia, 06010 UUM Sintok, Kedah, Malaysia

DOI:

https://doi.org/10.11113/jt.v78.8843

Keywords:

DNA microarray, Gene selection, Classification, Feature selection, Filter based gene selection method

Abstract

Deoxyribonucleic acid (DNA) microarray technology is the recent invention that provided colossal opportunities to measure a large scale of gene expressions simultaneously. However, interpreting large scale of gene expression data remain a challenging issue due to their innate nature of “high dimensional low sample sizeâ€. Microarray data mainly involved thousands of genes, n in a very small size sample, p which complicates the data analysis process. For such a reason, feature selection methods also known as gene selection methods have become apparently need to select significant genes that present the maximum discriminative power between cancerous and normal tissues. Feature selection methods can be structured into three basic factions; a) filter methods; b) wrapper methods and c) embedded methods. Among these methods, filter gene selection methods provide easy way to calculate the informative genes and can simplify reduce the large scale microarray datasets. Although filter based gene selection techniques have been commonly used in analyzing microarray dataset, these techniques have been tested separately in different studies. Therefore, this study aims to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues. In this experiment, common classifiers, Support Vector Machine (SVM) is used to train the selected genes. These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset. This study has discovered that IG and SNR are more suitable to be used with SVM. Furthermore, this study has shown SVM performance remained moderately unaffected unless a very small size of genes was selected.

References

Abusamra, H. 2013. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma. Procedia Computer Science, Elsevier. 23: 5-41.

Ahmad, F. K., Deris, S., Othman, N. H. and Norwawi, N. M. 2009. A Review of Feature Selection Techniques via Gene Expression Profiles. IEEE International Symposium on Information Technology (ITSim). 75-84.

Bolon-Canedo,V., Sanchez-Marono, N., Alonso-Betanzos, A., Benitez, J. M., Herrera, F. 2014. A Review of Microarray Datasets and Applied Feature Selection Methods. Information Sciences. 282: 111–135.

Chen, K. H., Wang, K. J., Tsai, M. L., Wang, K. M., Adrian, A. M., Cheng, W.-C., Chang, K. S. 2014. Gene Selection for Cancer Identification: A Decision Tree Model Empowered by Particle Swarm Optimization Algorithm. BMC Bioinformatics. 15(1): 49-64.

Cruz, J. A. and Wishart, D. S. 2006. Applications of Machine Learning in Cancer Prediction and Prognosis. Cancer Informatics. 2: 59-78.

De Campos, L. M., Cano, A., Castellano, J. G., & Moral, S. 2011. Bayesian Networks Classifiers for Gene-Expression Data. Proceeding of IEEE International Conference on Intelligent Systems Design and Applications. 1200–1206.

Hu, H., Li, J., Wang, H. and Daggard, G. 2006. Combined Gene Selection Methods for Microarray Data Analysis. Knowledge-Based Intelligent Information and Engineering Systems. 425: 976-983. Springer-Verlag Berlin Heidelberg.

Kourou, K., & Fotiadis, D. I. 2015. Computational Modelling in Cancer : Methods and Applications. Biomedical Data Journal. 1(1): 15–25.

Lacroix, M., Toillon, R.A. and Leclercq, G. 2006. Breast Cancer, An Update. Endocrine-Related Cancer. 13(53): 293–325.

Kumari, B., & Swarnkar, T. 2011. Filter versus Wrapper Feature Subset Selection in Large Dimensionality Microarray : A Review. International Journal of Computer Science and Information Technologies. 2(3): 1048–1053.

Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., Nowé, A. 2012. A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 9(4): 1106–19.

Marylyn, R., & Kim, D. 2014. Data Integration for Cancer Clinical Outcome Prediction. Journal of Health & Medical Informatics. 4–5.

Sweilam, N. H., Tharwat, A., & Abdel Moniem, N. K. 2010. Support Vector Machine for Diagnosis Cancer Disease: A Comparative Study. Egyptian Informatics Journal. 11(2): 81–92.

Zhang, Z., Li, J., Hu, H., and Zhou, H. 2010. The Effectiveness of Gene Selection for Microarray Classification Methods. ACIIDS'10 Proceedings of the Second International Conference on Intelligent Information and Database Systems. 300-309. Springer-Verlag Berlin, Heidelberg.

Downloads

Published

2016-05-30

How to Cite

A COMPARATIVE STUDY ON GENE SELECTION METHODS FOR TISSUES CLASSIFICATION ON LARGE SCALE GENE EXPRESSION DATA. (2016). Jurnal Teknologi (Sciences & Engineering), 78(5-10). https://doi.org/10.11113/jt.v78.8843