SOME INTRIGUING HIGH-THROUGHPUT DNA SEQUENCE VARIANTS PREDICTION OVER PROTEIN FUNCTIONALITY

Authors

  • Atabak Kheirkhah Advanced Informatics School (AIS), Universiti Teknologi Malaysia, 54100 UTM Kuala Lumpur, Malaysia
  • Salwani Mohd Daud Advanced Informatics School (AIS), Universiti Teknologi Malaysia, 54100 UTM Kuala Lumpur, Malaysia
  • Noor Azurati Ahmad @ Salleh Advanced Informatics School (AIS), Universiti Teknologi Malaysia, 54100 UTM Kuala Lumpur, Malaysia
  • Suriani Mohd Sam Advanced Informatics School (AIS), Universiti Teknologi Malaysia, 54100 UTM Kuala Lumpur, Malaysia
  • Hafiza Abas Advanced Informatics School (AIS), Universiti Teknologi Malaysia, 54100 UTM Kuala Lumpur, Malaysia
  • Sya Azmeela Shariff Advanced Informatics School (AIS), Universiti Teknologi Malaysia, 54100 UTM Kuala Lumpur, Malaysia
  • Yusnaidi Md Yusof Advanced Informatics School (AIS), Universiti Teknologi Malaysia, 54100 UTM Kuala Lumpur, Malaysia

DOI:

https://doi.org/10.11113/jt.v78.8967

Keywords:

DNA Sequence variants, protein interactions, protein functional integration

Abstract

This paper intends to review computational methods and high throughput automated tools for precisely prediction various functionalities of uncharacterized proteins based on their desired DNA sequence information alone. Then proposes a hybrid weighted network and Genetic Algorithm to improve prediction purpose. The main advantage of the method is the protein function and DNA sequence prediction can be computed precisely using best fitness parent in genetic algorithm. With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased exponentially and the pace is much slower in determining their biological attributes. The gap between DNA sequence variants and their functionalities has become increasingly large. However, detection of sequences based on protein data bank has become benchmark for many researchers. As amount of DNA sequence data continues to increase, the fundamental problem stay at the front of genome analysis. In the course of developing these methods, the following matters were often needed to consider: benchmark dataset construction, gene sequence prediction, operating algorithm, anticipated accuracy, gene recommender and functional integrations. In this review, we are to discuss each of them, with a different focus on operational algorithms and how to increase the accuracy of DNA sequence variants prediction.

References

Mostafavi S. and Morris Q. 2012. Combining Many Interaction Networks To Predict Gene Function And Analyze Gene Lists. Proteomics. 12: 1687-1696.

Pattin, K. A. and Moore, J. H. 2009. Role For Protein–Protein Interaction Databases In Human Genetics. Expert Review Of Proteomics. 6: 647-659.

Colinge, J., Rix, U., Bennett, K. L., and Supertiâ€Furga, G. 2012. Systems Biology Analysis Of Proteinâ€Drug Interactions. PROTEOMICS-Clinical Applications. 6: 102-116.

Pujol, A., Mosca, R., Farrés, J., and Aloy, P. 2010. Unveiling The Role Of Network And Systems Biology In Drug Discovery. Trends In Pharmacological Science. 31: 115-123.

Brutlag, D. L., Galper, A. R., and Millis, D. H. 1991. Knowledge-based Simulation Of DNA Metabolism: Prediction Of Enzyme Action. Computer Applications In The Biosciences: CABIOS. 7: 9-19.

Kreeger, P. K. and Lauffenburger, D. A. 2010. Cancer Systems Biology: A Network Modeling Perspective. Carcinogenesis. 31: 2-8.

Syed, A. S., D’Antonio, M., and Ciccarelli, F. D. 2009. Network Of Cancer Genes: A Web Resource To Analyze Duplicability, Orthology And Network Properties Of Cancer Genes. Nucleic Acids Research. 957.

Janga, S., Díaz-Mejía, J. J., and Moreno-Hagelsieb, G. 2011. Network-based function Prediction And Interactomics: The Case For Metabolic Enzymes. Metabolic Engineering. 13: 1-10.

Orth, J. D. and. Palsson, B. Ø. 2010. Systematizing The Generation Of Missing Metabolic Knowledge. Biotechnology And Bioengineering. 107: 403-412.

Tsoka, S. 2007. Computational Methodologies For Genome Evolution And Functional Association. Computers & Chemical Engineering. 31: 943-949.

Wang, P. I. and Marcotte, E. M. 2010. It's The Machine That Matters: Predicting Gene Function And Phenotype From Protein Networks. Journal Of Proteomics. 73: 2277-2289.

Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., et al. 2011. The STRING database In 2011: Functional Interaction Networks Of Proteins, Globally Integrated And Scored. Nucleic Acids Research. 39: D561-D568.

Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E. D., Sevier, C. S., et al. 2010. The Genetic Landscape Of A Cell Science. 327: 425-431.

Betel, D., Wilson, M., Gabow, A., Marks, D. S., and Sander, C. 2008. The microRNA. Org Resource: Targets And Expression. Nucleic Acids Research. 36: D149-D153.

Hsu, R.-J. and Tsai, H.-J. 2011. Performing the Labeled microRNA Pull-down (LAMP) assay System: An Experimental Approach For High-Throughput Identification Of Microrna-Target mRNAs. Therapeutic Oligonucleotides. ed: Springer. 241-247.

Birney, E., Stamatoyannopoulos J. A., Dutta A., Guigó R., Gingeras T. R., Margulies E. H., et al. 2007. Identification And Analysis Of Functional Elements In 1% Of The Human Genome By The ENCODE Pilot Project. Nature. 447: 799-816.

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser J., Berninger P., et al. 2010. Transcriptome-wide Identification Of RNA-Binding Protein And Microrna Target Sites By PAR-CLIP. Cell. 141: 129-141.

Jerlström-Hultqvist, J., Franzén, O., Ankarklev, J., Xu, F., Nohýnková, E., Andersson, J. O., et al. 2010. Genome Analysis And Comparative Genomics Of A Giardia Intestinalis Assemblage E Isolate. BMC Genomic. 11: 543.

Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., Grothe, R., and. Yeates, T. O. Assigning Protein Functions By Comparative Genome Analysis Protein Phylogenetic Profiles. ed: Google Patents.

Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet C. W., Furey T. S, et al. 2000. Knowledge-based Analysis Of Microarray Gene Expression Data By Using Support Vector Machines. Proceedings of the National Academy of Sciences. 97: 262-267.

Chou K.-C. 2011. Some Remarks On Protein Attribute Prediction And Pseudo Amino Acid Composition. Journal Of Theoretical Biology. 273: 236-247

De Wit, M., Junginger, M., Lensink, S., Londo, M., and Faaij, A. 2010. Competition Between Biofuels: Modeling Technological Learning And Cost Reductions Over Time. Biomass And Bioenergy. 34: 203-217,

Warde-Farley, D., Donaldson, S. L., Comes, O., Zuberi, K., Badrawi, R., Chao, P., et al. 2010. The GeneMANIA Prediction Server: Biological Network Integration For Gene Prioritization And Predicting Gene Function. Nucleic Acids Research. 38: W214-W220.

Rhead, B., Karolchik, D., Kuhn, R. M., Hinrichs, A. S., Zweig, A. S., Fujita, P. A. et al. 2009. The UCSC Genome Browser Database: Update 2010. Nucleic Acids Research. 939.

Needleman, S. B. and Wunsch, C. D.1970. A General Method Applicable To The Search For Similarities In The Amino Acid Sequence Of Two Proteins. Journal Of Molecular Biology. 48: 443-453,

Falda, M., Toppo, S., Pescarolo, A., Lavezzo, E., Di Camillo, B., Facchinetti, A. et al. 2012. Argot2: A Large Scale Function Prediction Tool Relying On Semantic Similarity Of Weighted Gene Ontology Terms. BMC Bioinformatics.13: S14.

Cai, Y.-D. and Chou, K.-C. 2000. Using Neural Networks For Prediction Of Subcellular Location Of Prokaryotic And Eukaryotic Proteins. Molecular Cell Biology Research Communications. 4: 172-173.

Dehouck, Y., Grosfils, A., Folch, B., Gilis, D., Bogaerts, P., and Rooman M. 2009. Fast And Accurate Predictions Of Protein Stability Changes Upon Mutations Using Statistical Potentials And Neural Networks: PoPMuSiC-2.0. Bioinformatic. 25: 2537-2543.

Cai, Y.-D., Ricardo, P.-W., Jen, C.-H., and Chou, K.-C. 2004. Application Of SVM To Predict Membrane Protein Types. Journal of Theoretical Biolog. 226: 373-376.

Kumar, M., Gromiha, M. M., and Raghava, G. P. 2011. SVM Based Prediction Of RNAâ€Binding Proteins Using Binding Residues And Evolutionary Information. Journal of Molecular Recognition. 24: 303-313.

Tegge, A. N., Wang, Z., Eickholt, J., and Cheng, J. 2009. NNcon: Improved Protein Contact Map Prediction Using 2D-Recursive Neural Networks. Nucleic Acids Research. 37: W515-W518,

Kazemian, H. B., White, K., and Palmer-Brown, D. 2013. Applications Of Evolutionary SVM To Prediction Of Membrane Alpha-Helices. Expert Systems with Applications. 40: 3412-3420.

Seguritan, V., Alves, Jr N., Arnoult, M., Raymond, A., Lorimer, D., Burgin, Jr A. B., et al. 2012. Artificial Neural Networks Trained To Detect Viral And Phage Structural Proteins.

Bose, S. K., Kazemian, H., Browne, A., and White, K. 2006. Presenting A Novel Neural Network Architecture For Membrane Protein Prediction. in Intelligent Engineering Systems, 2006. INES'06. Proceedings. International Conference.135-138.

Volpato, V., Adelfio, A., and Pollastri, G. 2013. Accurate Prediction Of Protein Enzymatic Class By N-To-1 Neural Networks. BMC bioinformatics. 14: S11.

Whisstock, J. C. and Lesk, A. M. 2003. Prediction Of Protein Function From Protein Sequence And Structure. Quarterly Reviews Of Biophysics. 36: 307-340.

Liu, Y., Guo, J., Hu, G., and Zhu, H. 2013. Gene Prediction In Metagenomic Fragments Based On The SVM Algorithm. BMC Bioinformatics. 14: S12.

Zou, C., Gong, J., and Li, H. 2013. An Improved Sequence Based Prediction Protocol For DNA-Binding Proteins Using SVM And Comprehensive Feature Analysis. BMC Bioinformatics. 14: 90.

Chang, C.-C.and Lin, C.-J. 2011. LIBSVM: A Library For Support Vector Machines. ACM Transactions On Intelligent Systems And Technology (TIST). 2: 27.

Dicks, E., Teague, J. W., Stephens, P., Raine, K., Yates, A., Mattocks C., et al. 2007. AutoCSA, An Algorithm For High Throughput DNA Sequence Variant Detection In Cancer Genomes. Bioinformatics. 23: 1689-1691.

Qi, J.-P., Shao, S.-H., Li, D.-D., and Zhou, G.-P. 2007. A Dynamic Model For The P53 Stress Response Networks Under Ion Radiation. Amino Acids. 33: 75-83.

Qi, J.-P., Ding, Y.-S., Shao, S.-H., Zeng, X.-H., and Chou, K.-C. 2010. Cellular Responding Kinetics Based On A Model Of Gene Regulatory Networks Under Radiotherapy. Health. 2: 137.

Downloads

Published

2016-06-12

How to Cite

SOME INTRIGUING HIGH-THROUGHPUT DNA SEQUENCE VARIANTS PREDICTION OVER PROTEIN FUNCTIONALITY. (2016). Jurnal Teknologi, 78(6-4). https://doi.org/10.11113/jt.v78.8967