SPECTRAL CLUSTERING ON GENE EXPRESSION PROFILE TO IDENTIFY CANCER TYPES OR SUBTYPES

Authors

  • Ang Jun Chin Department of Computer Science, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
  • Andri Mirzal Computer Science Department, College of Arts and Applied Sciences, Dhofar University, Salalah, Oman
  • Habibollah Haron Department of Computer Science, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia

DOI:

https://doi.org/10.11113/jt.v76.4036

Keywords:

Cancer, Gaussian kernel, microarray gene expression, spectral clustering, tumor

Abstract

Gene expression profile is eminent for its broad applications and achievements in disease discovery and analysis, especially in cancer research. Spectral clustering is robust to irrelevant features which are appropriated for gene expression analysis. However, previous works show that performance comparison with other clustering methods is limited and only a few microarray data sets were analyzed in each study. In this study, we demonstrate the use of spectral clustering in identifying cancer types or subtypes from microarray gene expression profiling. Spectral clustering was applied to eleven microarray data sets and its clustering performances were compared with the results in the literature. Based on the result, overall the spectral clustering slightly outperformed the corresponding results in the literature. The spectral clustering can also offer more stable clustering performances as it has smaller standard deviation value. Moreover, out of eleven data sets the spectral clustering outperformed the corresponding methods in the literature for six data sets. So, it can be stated that the spectral clustering is a promising method in identifying the cancer types or subtypes for microarray gene expression data sets.

References

Dudoit, S., Fridlyand, J., and Speed, T. P. 2002. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. J Amer. Statist. Assoc. 97(457): 77-87.

Cozzini, A., Jasra, A., Montana, G. 2013. Model-based Clustering with Gene Ranking Using Penalized Mixtures of Heavy-tailed Distributions. J Bioinformatics and Computational Biology. 11.

Qabaja, A., Jarada, T., Elsheikh, A., Alhajj, R. 2014. Prediction of Gene-Based Drug Indications Using Compendia of Public Gene Expression Data Aand Pubmed Abstracts. J Bioinformatics and Computational Biology.

Sharan, R., Elkon, R., and Shamir, R. 2002. Cluster Analysis and Its Applications to Gene Expression Data. In Mewes, H.-W., Seidel, H., and Weiss, B., Editors. Bioinformatics and Genome Analysis, number 38 in Ernst Schering Research Foundation Workshop. 83-108.

Sneath, P. H. A. and Sokal, R. R. 1973. Numerical Taxonomy: the Principles and Practice of Numerical Classification.

Wei, D., Jiang, Q., Wei, Y., Wang, S., 2012. A novel Hierarchical Clustering Algorithm for Gene Sequences. BMC Bioinformatics. 13(1): 174.

Liang, Y., Diehn, M., Watson, N., Bollen, A. W., Aldape, K. D., Nicholas, M. K., Lamborn, K. R., Berger, M. S., Botstein, D., Brown, P. O., and Israel, M. A. 2005. Gene Expression Profiling Reveals Molecularly and Clinically Distinct Subtypes of Glioblastoma Multiforme. Proc. of the National Academy of Sciences of the United States of America. 102(16): 5814-5819.

Liu, Q., Zhao, Z., Li, Y.-X., Li, Y. 2012. Feature Selection Based on Sensitivity Analysis of Fuzzy ISODATA. Neurocomputing. 85: 29-37.

Xu, R., Damelin, S., Nadler, B., Wunsch II, D.C. 2010. Clustering of High-dimensional Gene Expression Data with Feature Filtering Methods and Diffusion Maps. Artificial Intelligence in Medicin. 48: 91-98.

Zhang, S., Wong, H.-S., Shen, Y., Xie, D. 2012. A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity. IEEE/ACM Trans. Comput. Biol. Bioinformatic. 9(4): 1257-1263.

Xie, J., Wang, C. 2011. Using Support Vector Machines with a Novel Hybrid Feature Selection Method for Diagnosis of Erythemato-Squamous Diseases. Expert Systems with Applications. 38(5): 5809-5815.

George, G. V. S., Raj, V. C. 2011. Review on Feature Selection Techniques and the Impact of SVM for Cancer Classification using Gene Expression Profile. International Journal of Computer Science & Engineering Survey. 2(3): 16-27.

Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y. H., Goumnerova, L. C., Black, P. M., Lau, C., Allen, J. C., Zagzag, D., Olson, J. M., Curran, T., Wetmore, C., Biegel, J. A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., and Golub, T. R. 2002. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature. 415(6870): 436–442. [Online]. From: http://www.broadinstitute.org/mpr/CNS/ [Accessed on 13 July 2015].

Chang, R.-I., Chu, C.-C., Wu, Y.-Y., Chen, Y.-L. 2010. Gene Clustering by Using Query-Based Self-organizing Maps. Expert Systems with Application. 37(9): 6689-6694.

Wirth, H., Loffler, M., Bergen, M. von, Binder, H. 2011. Expression Cartography of Human Tissues Using Self Organizing Maps. BMC Bioinformatics. 12(1): 306.

Takahashi, M., Hayashi, H., Watanabe, Y., Sawamura, K., Fukui, N., Watanabe, J., Kitajima, T., Yamanouchi, Y., Iwata, N., Mizukami, K., Hori, T., Shimoda, K., Ujike, H., Ozaki, N., Iijima, K., Takemura, K., Aoshima, H., Someya, T. 2010. Diagnostic Classification of Schizophrenia by Neural Network Analysis of Blood-based Gene Expression Signatures. Schizophrenia Research. 119(1): 210-218.

Zainuddin, Z., Ong, P. 2011. Reliable Multiclass Cancer Classification of Microarray Gene Expression Profiles Using an Improved Wavelet Neural Network. Expert Systems with Applications. 38(11): 13711-13722.

Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., Sallan, S. E., Lander, E. S., Golub, T. R., and Korsmeyer, S. J. 2002. MLL Translocations Specify a Distinct Gene Expression Profile That Distinguishes a Unique Leukemia. Nature Genetics. 30(1): 41–47. [Online]. From: http://www.broadinstitute.org/mpr/publications/projects/Leukemia/expression_data.txt [Accessed on 13 July 2015].

Bredel, M., Bredel, C., Juric, D., Harsh, G. R., Vogel, H., Recht, L. D., and Sikic, B. I. 2005. Functional Network Analysis Reveals Extended Gliomagenesis Pathway Maps and Three Novel MYC-Interacting Genes in Human Gliomas. Cancer Research. 65(19): 8679–8689. [Online]. From: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2223 [Accessed on 13 July 2015].

Ma, S., Dai, Y. 2011. Principal Component Analysis Based Methods in Bioinformatics studies. Brief Bioinform. 12(6): 714-722.

(21) Lee, H. and Singh, R. 2012. Unsupervised Kernel Parameter Estimation by Constrained Nonlinear Optimization for Clustering Nonlinear Biological Data. IEEE Int. Conf. on Bioinformatics and Biomedicine (BIBM) 2011. 1-6.

Weston, J., Elisseeff, A., Scholkopf, B., and Tipping, M. 2003. Use of the Zero Norm with Linear Models and Kernel Methods. The Journal of Machine Learning Research. 3: 1439-1461.

Shi, J. and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Trans on Pattern Analysis and Machine Intelligence. 22(8): 888-905.

Ng, A., Jordan, M., and Weiss, Y. 2002. On Spectral Clustering: Analysis and An Algorithm. Advances in Neural Information Processing Systems. 2: 849-856.

Yu, S. and Shi, J. 2003. Multiclass Spectral Clustering. Proc. 9th IEEE Int. Conf. on Computer Vision, 2003. 1: 313-319

Luxburg, U. V. 2007. A Tutorial on Spectral Clustering. Statistics and Computing. 17(4): 395-416.

Malik, J., Belongie, S., Leung, T., and Shi, J. 2001. Contour and Texture Analysis for Image Segmentation. Int. J of Computer Vision. 43(1): 7-27.

Kannan, R., Vempala, S., and Vetta, A. 2004. On clusterings: Good, Bad and Spectral. J ACM (JACM). 51(3): 497-515.

Luxburg, U. V., Belkin, M., and Bousquet, O. 2008. Consistency of Spectral Clustering. The Annals of Statistics. 36(2): 555-586.

Luxburg, U. v., Bousquet, O., and Belkin, M. 2004. Limits of Spectral Clustering. In Neural Information Processing Systems (NIPS). 857-864.

Moran, G., Stokes, C., Thewes, S., Hube, B., Coleman, D. C., and Sullivan, D. 2004. Comparative Genomics Using Candida Albicans DNA Microarrays Reveals Absence and Divergence of Virulence-associated Genes in Candida Dubliniensis. Microbiology. 150(10): 3363-3382.

Leung, Y. F. and Cavalieri, D. 2003. Fundamentals of cDNA Microarray Data Analysis. Trends in Genetics. 19(11): 649-659.

Peterson, L. E. 2013. Classification Analysis of DNA Microarrays. 1 edition. John Wiley & Sons.

Shalon, D., Smith, S. J., and Brown, P. O. 1996. A DNA Microarray System for Analyzing Complex DNA Samples Using Two-color Fluorescent Probe Hybridization. Genome Research. 6(7): 639-645.

Duggan, D. J., Bittner, M., Chen, Y., Meltzer, P., and Trent, J. M. 1999. Expression Profiling using cDNA Microarrays. Nature Genetics. 21: 10-14.

Lipshutz, R. J., Fodor, S. P. A., Gingeras, T. R., and Lockhart, D. J. 1999. High Density Synthetic Oligonucleotide Arrays. Nature Genetics. 21: 20-24.

Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Norton, H., and Brown, E. L. 1996. Expression Monitoring By Hybridization to High-density Oligonucleotide Arrays. Nature Biotechnology. 14(13): 1675-1680.

Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O., and Staudt, L. M. 2000. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature. 403(6769): 503–511. [Online]. From: http://llmpp.nih.gov/lymphoma/analysis.shtml [Accessed on 13 July 2015].

Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., and Mazumder, A. 2006. Prognostic Gene Expression Signatures Can Be Measured in Tissues Collected in Rnalater Preservative. The Journal of Molecular Diagnostics. 8(1): 31-39. [Online]. From: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3726 [Accessed on 13 July 2015].

Dyrskjot, L., Thykjaer, T., Kruhoffer, M., Jensen, J. L., Marcussen, N., Hamilton-Dutoit, S., Wolf, H., and Orntoft, T. F. 2003. Identifying Distinct Classes of Bladder Carcinoma Using Microarrays. Nature Genetics. 33(1): 90-96. [Online]. From: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE089 [Accessed on 13 July 2015].

Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., Richards, W. G., Sugarbaker, D. J., and Bueno, R. 2002. Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research. 62(17): 4963-4967. [Online]. From: http://www.chestsurg.org/publications/2002-microarray.aspx [Accessed on 13 July 2015].

Nutt, C. L., Mani, D. R., Betensky, R. A., Tamayo, P., Cairncross, J. G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M. E., Batchelor, T. T., Black, P. M., Deimling, A. v., Pomeroy, S. L., Golub, T. R., and Louis, D. N. 2003. Gene Expression-based Classification of Malignant Gliomas Correlates Better With Survival Than Histological Classification. Cancer Research. 63(7): 1602-1607. [Online]. From: http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=82 [Accessed on 13 July 2015].

Risinger, J. I., Maxwell, G. L., Chandramouli, G. V. R., Jazaeri, A., Aprelikova, O., Patterson, T., Berchuck, A., and Barrett, J. C. 2003. Microarray Analysis Reveals Distinct Gene Expression Profiles Among Different Histologic Types of Endometrial Cancer. Cancer Research. 63(1): 6-11. [Online]. From: http://home.ccr.cancer.gov/risingerdata1102/ [Accessed on 13 July 2015].

Su, A. I., Welsh, J. B., Sapinoso, L. M., Kern, S. G., Dimitrov, P., Lapp, H., Schultz, P. G., Powell, S. M., Moskaluk, C. A., Frierson, H. F., and Hampton, G. M. 2001. Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures. Cancer Research. 61(20): 7388-7393. [Online]. From: http://bioinformatics.rutgers.edu/Static/Supplements/CompCancer/datasets.htm [Accessed on 13 July 2015].

West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J. A., Marks, J. R., and Nevins, J. R. 2001. Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles. Proc. National Academy of Sciences. 98(20): 11462-11467. [Online]. From: http://bioinformatics.rutgers.edu/Static/Supplements/CompCancer/datasets.htm [Accessed on 13 July 2015].

Gao, Y. and Church, G. 2005. Improving Molecular Cancer Class Discovery Through Sparse Non-negative Matrix Factorization. Bioinformatics. 21(21): 3970-3975.

Kim, H. and Park, H. 2007. Sparse Non-Negative Matrix Factorizations Via Alternating Non-negativity-constrained Least Squares For Microarray Data Analysis. Bioinformatics. 23(12): 1495-1502.

Dhillon, I. S., Guan, Y., and Kulis, B. 2004. Kernel k-means: Spectral Clustering and Normalized Cuts. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04. 551-556.

Long, B., Zhang, Z. M., Wu, X., and Yu, P. S. 2006. Spectral Clustering for Multi-type Relational Data. Proc. 23rd Int. Conf. on Machine Learning, ICML ’0. 585-592.

Alzate, C. and Suykens, J. A. K. 2010. Multiway Spectral Clustering with Out-of-Sample Extensions Through Weighted Kernel PCA. IEEE Trans on Pattern Analysis and Machine Intelligence. 32(2): 335-347.

Kluger, Y., Basri, R., Chang, J. T., and Gerstein, M. 2003. Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions. Genome Research. 13(4): 703-716.

Dhillon, I. S. 2001. Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning. Proc. 7th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, KDD ’01. 269-274.

Speer, N., Frohlich, H., Spieth, C., and Zell, A. 2005a. Functional Grouping of Genes Using Spectral Clustering And Gene Ontology. Proc. IEEE Int. Joint Conf. on Neural Networks, 2005. IJCNN ’05. 1: 298-303.

Speer, N., Spieth, C., and Zell, A. 2005b. Spectral Clustering Gene Ontology Terms to Group Genes by Function. In Casadio, R. and Myers, G., editors, Algorithms in Bioinformatics. 1-12.

Alzate, C. and Suykens, J. A. K. 2006. A Weighted Kernel PCA Formulation with Out-of-Sample Extensions for Spectral Clustering Methods. Int. Joint Conf. on Neural Networks, 2006. IJCNN ’06. 138-144.

Pelckmans, K., Van Vooren, S., Coessens, B., Suykens, J., and De Moor, B. 2006. Mutual Spectral Clustering: Microarray Experiments Versus Text Corpus. Proc. workshop on Probabilistic Modeling and Machine Learning in Structural and Systems Biology. 55-58.

Tritchler, D., Fallah, S., and Beyene, J. 2005. A Spectral Clustering Method for Microarray Data. Computational Statistics & Data Analysis. 49(1): 63-76.

Higham, D. J., Kalna, G., and Kibble, M. 2007. Spectral Clustering and Its Use in Bioinformatics. J Computational and Applied Mathematics. 204(1): 25-37.

Thurlow, J. K., Murillo, C. L. P., Hunter, K. D., Buffa, F. M., Patiar, S., Betts, G., West, C. M. L., Harris, A. L., Parkinson, E. K., Harrison, P. R., Ozanne, B. W., Partridge, M., and Kalna, G. 2010. Spectral Clustering of Microarray Data Elucidates the Roles of Microenvironment Remodeling and Immune Responses in Survival of Head and Neck Squamous Cell Carcinoma. J Clinical Oncology. 28(17): 2881-2888.

Huang, G. T., Cunningham, K. I., Benos, P. V., CHENNUBHOTLA, C. S. 2013. Spectral Clustering Strategies for Heterogeneous Disease Expression Data. Pacific Symposium on Biocomputing. 212-223.

Downloads

Published

2015-08-27

Issue

Section

Science and Engineering

How to Cite

SPECTRAL CLUSTERING ON GENE EXPRESSION PROFILE TO IDENTIFY CANCER TYPES OR SUBTYPES. (2015). Jurnal Teknologi, 76(1). https://doi.org/10.11113/jt.v76.4036