Compression of text files using genomic code compression algorithm

  • Authors

    • G Murugesan
    • Rosario Gilmary
    2018-05-29
    https://doi.org/10.14419/ijet.v7i2.31.13399
  • Data compression, text compression, lossy and lossless compression, DNA, bases, bit reduction, hexa decimal format, variable length code, huffman codes.
  • Text files utilize substantial amount of memory or disk space. Transmission of these files across a network depends upon a considerable amount of bandwidth. Compression procedures are explicitly advantageous in telecommunications and information technology because it facilitate devices to disseminate or reserve the equivalent amount of data in fewer bits. Text compression techniques section, the English passage by observing the patters and provide alternative symbols for larger patters of text. To diminish the depository of copious information and data storage expenditure, compression algorithms were used. Compression of significant and massive cluster of information can head to the improvement in retrieval time. Novel lossless compression algorithms have been introduced for better compression ratio. In this work, the various existing compression mechanisms that are particular for compressing the text files and Deoxyribonucleic acid (DNA) sequence files are analyzed. The performance is correlated in terms of compression ratio, time taken to compress/decompress the sequence and file size. In this proposed work, the input file is converted to DNA format and then DNA compression procedure is applied.

     

     

  • References

    1. [1] Radescu R & Pasca S, “String Matching in Text Compressionâ€, ECAI 2017-International Conference, Targoviste, Romania, 9th edition, (2017).

      [2] Dufourq E & Bassett BA, “Text Compression for Sentiment Analysis via Evolutionary Agorithmsâ€, PRASA-RobMech International Conference, Bloemfontein, South Africa, (2017).

      [3] Conrad KJ & Wilson PR, “Grammatical Ziv-Lempel Compression: Achieving PPM-Class Text Compression Ratios with LZ-Class Decompression Speedâ€, Data Compression Conference (DCC), (2016).

      [4] Barua L, Dhar PK, Alam L & Echizen I, “Bangla text compression based on modified Lempel-Ziv-Welch algorithmâ€, International Conference on Electrical, Computer and Communication Engineering (ECCE), (2017), pp.855-859.

      [5] Eric PV, Gopalakrishnan G & Karunakaran M, “An Optimal Seed Based Compression Algorithm for DNA Sequencesâ€, Advances in Bioinformatics, (2016).

      [6] Zhu Z, Zhang Y, Ji Z, He S & Yang X, “High - throughput DNA sequence data compressionâ€, Briefings in bioinformatics, (2015).

      [7] Mehta K & Ghrera SP, “DNA compression using referential compression algorithmâ€, Eighth International Conference Contemporary Computing (IC3), (2015).

      [8] Saada B & Zhang J, “DNA Sequences Compression Algorithm Based on Extended-ASCII Representationâ€, Proceedings of the world congress on engineering and computer science, (2015).

      [9] Baloul FM, Abdullah MH & Babikir EA, “ETAO: Symbol Mapping Tranformation Method for Text Compressionâ€, International Conference on Computer Electrical and Electronics Engineering (ICCEEE), (2013), pp.384-389.

      [10] Satyanvesh D, Balleda K & Padyana A, “GenCodex- A Novel Algorithm for Compressing DNA seuences on Multi-cores and GPUsâ€, Proc. IEEE, 19th International Conf. on High Performance Computing (HiPC), (2012).

      [11] Prasad VH & Kumar PV, “A New Revised DNA Cramp Tool Based Approach of Chopping DNA Repetitive and Non- Repetitive Genome Sequencesâ€, International Journal of Computer Science Issues (IJCSI), Vol.9, No.6,(2012), pp.448-454.

      [12] Rajeswari PR & Apparao A, “DNABIT Compress-Genome compression algorithmâ€, Bioinformatics, Vol.5, No.8,(2011), pp.350-360.

      [13] Rajeswari PR & Apparao A, “GenBit Compress Tool (GBC): A Java-Based Tool To Compress DNA Sequences and Compute Compression Ratio (BITS/BASE) Of Genomesâ€, International Journal of Computer Science and Information Technology, Vol.2, No.3,(2013), pp.181-191.

      [14] Afify H, Islam M, Wahed MA & Kadah YM, “Genomic sequences differential compression modelâ€, International Journal of Computer Science and Information Technology, Vol.3, (2011), pp.145-154.

      [15] Cao MD, Dix TI, Allison L & Mears C, “A simple statistical algorithm for biological sequence compressionâ€, Proceedings of the Data Compression Conference, (2007), pp.43-52.

      [16] Myung JI, Navarro DJ & Pitt MA, “Model selection by normalized maximum likelihoodâ€, Journal of Mathematical Psychology, Vol.50, No.2, (2006), pp.167-179.

      [17] Behzadi B & Le Fessant F, “DNA compression challege revisited: a dynamic programming approachâ€, Proceedings of the Annual Symposium on Combinatorial Pattern Matching, (2005).

      [18] Abel J & Teahan W, “Universal Text Preprocessing for Data Compressionâ€, IEEE Transactions On Computers, Vol.54, No.5, (2005).

      [19] Ma B, Tromp J & Li M, “PatternHunter: fast and more sensitive homology searchâ€, Bioinformatics, Vol.18, No.3, (2002), pp.440-445.

      [20] Chen X, Li M, Ma B & Tromp J, “DNACompress: fast and effective DNA sequence compressionâ€, Bioinformatics, Vol.18, no. 12, (2002), pp.1696-1698.

      [21] Chen X, Kwong S & Li M, “Compression algorithm for DNA sequences and its applications in genome comparisonâ€, Proceedings of the 4th Annual International Conference on Computation Molecular Biology, (2000).

      [22] Matsumoto T, Sadakane K & Imai H, “Biological sequence compression algorithmsâ€, Genome Informatics, (2000), pp.43-52.

      [23] Loewenstern D & Yianilos PN, “Significantly lower entropy estimates for natural DNA sequencesâ€, Journal of Computational Biology, Vol.6, No.1, (1999), pp.125-142.

      [24] Grumbach S & Tahi F, “A new challenge for compression algorithms: genetic sequencesâ€, Information Processing & Management, Vol.30, No.6,(1994), pp.875-886.

      [25] Grumbach S & Tahi F, “Compression of DNA sequencesâ€, Proceedings of the IEEE Symposium on Data Compression, (1993).

  • Downloads

  • How to Cite

    Murugesan, G., & Gilmary, R. (2018). Compression of text files using genomic code compression algorithm. International Journal of Engineering & Technology, 7(2.31), 69-73. https://doi.org/10.14419/ijet.v7i2.31.13399