Spectral domain characterization of genome sequences

  • Authors

    • P Venkateswarlu
    • E G. Rajan
    2018-04-03
    https://doi.org/10.14419/ijet.v7i2.12.11277
  • Genome Sequence, Spectral Characterization, Big Data, Map Reduce
  • Genome sequencing became an important research area for understanding order of DNA and discovering genetic secrets of humans. Fortunately voluminous data in this area is available for the study of genome sequences. Characterization of genome sequences is non-trivial and tedious task. Nevertheless, algorithms were found in the literature to study them. As the genome sequences data has characteristics of big data we proposed a technique based on MapReduce programming paradigm to attempt spectral characterization of genome sequences. A machine learning approach is used to discover trends in the genome sequences. Rationale behind using MapReduce, a distributed programming framework, is its support for parallel processing and the usage of more powerful Graphical Processing Units (GPUs). Moreover, the datasets can be maintained in cloud so as to handle it with ease. We built a prototype application to demonstrate proof of the concept. Our empirical results reveal encouraging observations in the genomic study.

     

     

  • References

    1. [1] Mads Albertsen, Philip Hugenholtz, Adam Skarshewski, KÃ¥re L Nielsen, Gene W Tyson AND Per H Nielsen. (2013). Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature Biotechnology. 31 (6), p533-542.

      [2] Yuqiang Liu, Han Wu, Hong Chen AND Yanling Liu. (2014). A gene cluster encoding lectin receptor kinases confers broad-spectrum and durable insect resistance in rice. Nature Biotechnology, p1-8.

      [3] DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 43(5):491-8. PMID: 21478889 (2011).

      [4] Laxmi Silwal-Pandit, Hans Kristian Moen Vollan, Suet-Feung Chin, Oscar M. Rueda, Steven McKinney, Tomo Osako, David A. Quigley, Vessela N. Kristensen, Samuel Aparicio. (2014). TP53 mutation spectrum in breast cancer is subtype specific and has distinct prognostic relevance. American Association for Cancer, p1-30.

      [5] Alexandra S. Dubrovinaa, Konstantin V. Kiseleva AND Valeriya S. Khristenkoa. (2013). Expression of calcium-dependent protein kinase (CDPK) genes under abiotic stress conditions in wild-growing grapevine Vitis amurensis. Journal of Plant Physiology, p1491-1500.

      [6] Simona Soverini, Caterina De Benedittis, Katerina Machova Polakova AND Adela Brouckova. (2013). Unraveling the complexity of tyrosine kinase inhibitor-resistant populations by ultra-deep sequencing of the BCR-ABL kinase domain, p1-37.

      [7] Bahareh Rabbani, Mustafa Tekin and Nejat Mahdieh. (2014). the promise of whole-exome sequencing in medical genetics. Journal of Human Genetics, p6-15.

      [8] O Abdel-Wahab and A Dey. (2013). The ASXL–BAP1 axis, new factors in myelopoiesis, cancer and epigenetics, p11-15.

      [9] Raphael Rytz, Vincent Croset AND Richard Benton. (2013). Ionotropic Receptors (IRs), Chemosensory ionotropic glutamate receptors in Drosophila and beyond. Insect Biochemistry and Molecular Biology, p1-10.

      [10] David Bertsch, Jo¨ rg Rau, Marcel R. Eugster, Martina C. Haug, Paul A. Lawson, Christophe Lacroix and Leo Meile. (2013). Listeria fleischmannii sp. nov., isolated from cheese. International Journal of Systematic and Evolutionary Microbiology, p527-532.

      [11] Molecular Plant. (2013). Rapid and Efficient Gene Modification in Rice and Brachypodium Using TALENs. .. 6 (4), p1365-1368.

      [12] Nicholas C. P. Cross, Daniel Catovsky and Jonathan C. Strefford Gomez, Jade Forster, Helen Parker, Anton Parker, Anne Gardiner, Andrew Collins AND Monica Else,. (2013). the clinical significance of NOTCH1 and SF3B1 mutations in the UK LRF. Bloodjournal.hematologylibrary.org at HEALTH SERVICES, p468-475.

      [13] David W. Craig, Joyce A. O'Shaughnessy, Jeffrey A. Kiefer AND et al. (2012). Genome and Transcriptome Sequencing in Prospective Metastatic, p104-118.

      [14] Hui Li, Zhiyu Peng, Xiaohong Yang, Weidong Wang, Junjie Fu, Jianhua Wang, Yingjia Han AND Yuchao Chai. (2013). Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels, 45 (1), p43-52.

      [15] Donald B. Smith AND Peter Simmonds. (2014). Consensus proposals for classification of the family Hepeviridae. Journal of General Virology, p2223-2232.

      [16] Ronald Taylor. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics, 11(12), 2010.

      [17] GATK (2018). Genome Analysis Toolkit. Available online at https://software.broadinstitute.org/gatk/download/ [accessed: 10 Dec 2017].

  • Downloads

  • How to Cite

    Venkateswarlu, P., & G. Rajan, E. (2018). Spectral domain characterization of genome sequences. International Journal of Engineering & Technology, 7(2.12), 189-192. https://doi.org/10.14419/ijet.v7i2.12.11277