Optimal Pair DNA Sequence Alignment based on Matching Regions and Multi-Zone Genetic Algorithm

Sara Q. Abedulridha; Eman S. Al-Shamery

doi:10.14419/ijet.v7i4.19.27993

Authors

Sara Q. Abedulridha
Eman S. Al-Shamery

Received date: February 26, 2019

Accepted date: February 26, 2019

Published date: November 27, 2018

DOI:

https://doi.org/10.14419/ijet.v7i4.19.27993

Keywords:

Bioinformatics, DNA Sequence Alignment, Matching regions, Multi-Zone Genetic Algorithm, Parallel processing.

Abstract

DNA sequence alignment is an important and challenging task in Bioinformatics, which is used for finding the optimal arrangement between two sequences. In this paper, two methods are proposed in two stages to solve the pairwise sequence alignment problem. The first method is Matching Regions(MR) concerns on splitting the DNA into regions with adaptive interleaving windows to isolate the DNA tape into matched and non-matched regions. Additionally, a Multi-Zone Genetic Algorithm (MZGA) is proposed as an improved method in the second stage. It consists of segmenting a large non-matched region into smaller search space. Then, the MZGA is implemented in parallel to save time. Genetic Algorithm can be applied as an optimization toolto produce multiple solutions. Furthermore, the improvement focuses on the enhancement of Simple GA operators. In the selection, the population is divided into three Zones according to the fitness score. A new crossover approach is proposed depending on cut-points and location of gaps. The proposed method guarantees that the value of fitness tends to improvement or convergence in each successive generation. Thus, the offspring of populations will have better fitness value. The system has been applied to the real-world dataset of DNA with variable lengths which are ranged from 66 bases up to 26037 bases. As a result, the proposed technique satisfied the best alignment score of the DNA sequences. Finally, it is worth mentioning that the proposed system proved to be generalizable.
Â
Â

References

[1] J. Xiong, Essential bioinformatics. Cambridge University Press, 2006.
[2] M. Gollery, â€œBioinformatics: Sequence and Genome Analysis, David W. Mount. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 2004, 692 pp., $75.00, paperback. ISBN 0-87969-712-1.,â€ Clin. Chem., vol. 51, no. 11, p. 2219, 2005.
[3] M. L. Metzker, â€œSequencing technologies - the next generation.,â€ Nat. Rev. Genet., vol. 11, no. 1, pp. 31â€“46, Jan. 2010.
[4] T. P. Quinn, I. Erb, M. F. Richardson, and T. M. Crowley, â€œUnderstanding sequencing data as compositions: an outlook and review,â€ Bioinformatics, no. April, 2018.
[5] J. Sun, K. Chen, and Z. Hao, â€œPairwise alignment for very long nucleic acid sequences,â€ Biochem. Biophys. Res. Commun., vol. 502, no. 3, pp. 313â€“317, 2018.
[6] N. H. Kaghed, E. S. Al, F. Emad, and K. Al-Khuzaie, â€œComparative study of Genetic Algorithm and Dynamic Programming of DNA Multiple Sequence Alignment,â€ J. Babylon Univ. Appl. Sci. No, vol. 25, no. 2, 2017.
[7] P. Bonizzoni and G. Della Vedova, â€œThe complexity of multiple sequence alignment with SP-score that is a metric,â€ Theor. Comput. Sci., vol. 259, no. 1â€“2, pp. 63â€“79, 2001.
[8] W. Just, â€œComputational Complexity of Multiple Sequence Alignment with SP-Score,â€ J. Comput. Biol., vol. 8, no. 6, pp. 615â€“623, Nov. 2001.
[9] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, â€œBasic local alignment search tool,â€ J. Mol. Biol., vol. 215, no. 3, pp. 403â€“410, 1990.
[10] W. R. Pearson, â€œRapid and sensitive sequence comparison with FASTP and FASTA,â€ Methods Enzymol., vol. 183, pp. 63â€“98, Jan. 1990.
[11] D. W. Mount, â€œUsing the Basic Local Alignment Search Tool (BLAST).,â€ CSH Protoc., vol. 2007, p. pdb.top17, Jul. 2007.
[12] C. Kemena and C. Notredame, â€œUpcoming challenges for multiple sequence alignment methods in the high-throughput era,â€ Bioinformatics, vol. 25, no. 19, pp. 2455â€“2465, Oct. 2009.
[13] B. Chowdhury and G. Garai, â€œA review on multiple sequence alignment from the perspective of genetic algorithm,â€ Genomics, vol. 109, no. 5â€“6, pp. 419â€“431, Oct. 2017.
[14] M. A. Larkin et al., â€œClustal W and Clustal X version 2.0,â€ Bioinformatics, vol. 23, no. 21, pp. 2947â€“2948, Nov. 2007.
[15] T. L. Bailey and C. Elkan, â€œFitting a Mixture Model by Expectation Maximization to Discover Motifs in Bipolymers,â€ Proc. Second Int. Conf. Intell. Syst. Mol. Biol., pp. 28â€“36, 1994.
[16] P. Di Tommaso et al., â€œT-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension,â€ Nucleic Acids Res., vol. 39, no. suppl_2, pp. W13â€“W17, Jul. 2011.
[17] I.-G. Mircea, I. Bocicor, and G. Czibula, â€œA Reinforcement Learning Based Approach to Multiple Sequence Alignment,â€ in Soft Computing Applications, 2018, pp. 54â€“70.
[18] H. Nabeel Kaghed, S. Eman Alâ€“Shamery, and F. E. K. Al-Khuzaie, â€œMultiple Sequence Alignment based on Developed Genetic Algorithm,â€ Indian J. Sci. Technol., vol. 9, no. 2, 2016.
[19] R. R. Rani and D. Ramyachitra, â€œMultiple sequence alignment using multi-objective based bacterial foraging optimization algorithm,â€ BioSystems, vol. 150, pp. 177â€“189, 2016.
[20] J. Lee, Y. Yeu, H. Roh, Y. Yoon, and S. Park, â€œBulkAligner: A novel sequence alignment algorithm based on graph theory and Trinity,â€ Inf. Sci. (Ny)., vol. 303, pp. 120â€“133, 2015.
[21] J. Li, S. Ranka, and S. Sahni, â€œPairwise sequence alignment for very long sequences on GPUs,â€ 2012 IEEE 2nd Int. Conf. Comput. Adv. Bio Med. Sci. ICCABS 2012, 2012.
[22] S. K. Pal and P. P. Wang, Genetic algorithms for pattern recognition. CRC press, 2017.

Optimal Pair DNA Sequence Alignment based on Matching Regions and Multi-Zone Genetic Algorithm

Authors

Sara Q. Abedulridha

Eman S. Al-Shamery

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite