REDUCING THE SEARCH SPACE AND TIME COMPLEXITY OF NEEDLEMAN-WUNSCH ALGORITHM (GLOBAL ALIGNMENT) AND SMITH-WATERMAN ALGORITHM (LOCAL ALIGNMENT) FOR DNA SEQUENCE ALIGNMENT
DOI:
https://doi.org/10.11113/jt.v77.6564Keywords:
Sequence comparison, sequence alignment, global alignment, local alignment, similarity, optimal alignment, scoring, traceback, match, mismatch, gapsAbstract
The fundamental procedure of analyzing sequence content is sequence comparison. Sequence comparison can be defined as the problem of finding which parts of the sequences are similar and which parts are different, namely comparing two sequences to identify similarities and differences between them. A typical approach to solve this problem is to find a good and reasonable alignment between the two sequences. The main research in this project is to align the DNA sequences by using the Needleman-Wunsch algorithm for global alignment and Smith-Waterman algorithm for local alignment based on the Dynamic Programming algorithm. The Dynamic Programming Algorithm is guaranteed to find optimal alignment by exploring all possible alignments and choosing the best through the scoring and traceback techniques. The algorithms proposed and evaluated are to reduce the gaps in aligning sequences as well as the length of the sequences aligned without compromising the quality or correctness of results. In order to verify the accuracy and consistency of measurements obtained in Needleman-Wunsch and Smith-Waterman algorithms the data is compared with Emboss (global) and Emboss (local) with 600 strands test data.References
S. K. Moore. 2000. Understanding the Human Genome. IEEE Press. 11: 34-35.
Todd, J. Vision and Aoife McLysaght. 2003. Computational Tools and Resources in Plant Genome Informatics. In P. Christou & H. Klee (Ed.). Handbook of Plant Biotechnology John Wiley & Sons. 1552.
K. Charter, J. Schaeffer, and D. Szafron. 2000. Sequence Alignment using FastLSA. International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences. 239-245.
S. F. Altschul, T. L. Madden, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research. 25: 3389-3402.
D.J. Lipman. and W. R. Pearson. 1985. Rapid and Sensitive Protein Similarity Searches. American Association for the Advancement of Science. 227(4693): 1435-1441.
Z. Michalewicz, D. B. Fogel. 2004. How to Solve It: Modern Heuristics. New York: Springer.
S. B. Needleman and Christian, D. Wunsch. 1970. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Sequences. Journal of Molecular Biology. 48: 443-453.
Temple, F. Smith and Michael, S. Waterman. 1981. Identification of Common Molecular Subsequences. Journal of Molecular Biology. 147: 195-197.
A. J. Ropelewski, H. B. Nicholas and D. W. Jr. Deerfield. 2002. Strategies for Searching Sequence Databases. Journal of Biotechniques. 28: 1174-1178.
W. R. Taylor, T. P. Flores and C. A. Orengo. 1994. Multiple Protein Structure Alignment. 10.
A. Davidson. 2004. A Fast Pruning Algorithm for Optimal Sequence Alignment. Proc. 2nd IEEE International Symp. Bioinformatics and Bioengineering. 49-56.
B. Gärtner and J. Matoušek. 2006. Understanding and Using Linear Programming: Elementary Introduction for Mathematicians and Computer Scientists, Berlin. Springer. 1858-1870. Information on http://www.proteinscience.org.
NCBI. Genomic Biology. 2015. Information on http://www.ncbi.nlm.nih.gov/GenBank.
Downloads
Published
Issue
Section
License
Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.