Protein sequence comparison under a new complex representation of amino acids based on their physio-chemical properties


  • Jayanta Pal Narula Institute of Technology
  • Soumen Ghosh Narula Institute of Technology
  • Bansibadan Maji National Institute of Tcehnology, Durgapur
  • Dilip Kumar Bhattacharya University of Calcutta





Complex Representation, DFT, Hydrophobicity Properties, Hydrophilicity (Polarity) Property, ICD, Phylogenetic Tree, Voss Representation.


The paper first considers a new complex representation of amino acids of which the real parts and imaginary parts are taken respectively from hydrophilic properties and residue volumes of amino acids. Then it applies complex Fourier transform on the represented sequence of complex numbers to obtain the spectrum in the frequency domain. By using the method of ‘Inter coefficient distances’ on the spectrum obtained, it constructs phylogenetic trees of different Protein sequences. Finally on the basis of such phylogenetic trees pair wise comparison is made for such Protein sequences. The paper also obtains pair wise comparison of the same protein sequences following the same method but based on a known complex representation of amino acids, where the real and imaginary parts refer to hydrophobicity properties and residue volumes of the amino acids respectively. The results of the two methods are now compared with those of the same sequences obtained earlier by other methods. It is found that both the methods are workable, further the new complex representation is better compared to the earlier one. This shows that the hydrophilic property (polarity) is a better choice than hydrophobic property of amino acids especially in protein sequence comparison.


[1] R. F. Voss, "Evolution of long-range fractal correlations and 1/f noise in DNA base sequences," Phy. Rev. Lett., vol. 68, no. 25, pp. 3805-3808, June 1992.

[2] B. D. Silverman, and R. Linsker, "A measure of DNA periodicity," J. Theor. Biol., vol. 118, pp. 295-300, 1986.

[3] R. Zhang, and C. T. Zhang, "Z curves, an intuitive tool for visualizing and analyzing the DNA sequences," J. Biomol. Struct. Dyn, vol. 11, no. 4, pp. 767-782, February 1994.

[4] D. Anastassiou, "Genomic signal processing," IEEE Signal Proc.Mag., vol. 18, no. 4, pp. 8-20, July 2001.

[5] P. D. Cristea, "Genetic signal representation and analysis," in Proc. SPIE Conference, International Biomedical Optics Symposium (BIOS'02), vol. 4623, pp. 77-84, 2002.

[6] Complex Representation of DNA Sequences by Carlo Cattani- M. Elloumi et al. (Eds.): BIRD 2008, CCIS 13, pp. 528–537, 2008._c Springer-Verlag Berlin Heidelberg 2008.

[7] A. K. Brodzik, and 0. Peters, "Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences,"in Proc. IEEE ICASSP, vol. 5, pp. 373-376, 2005.

[8] J. Ning, C. N. Moore, and J. C. Nelson, "Preliminary wavelet analysis of genomic sequences," in Proc. IEEE Bioinformatics Conf (CSB), pp. 509-510, August 2003.

[9] G. L. Rosen, "Signal processing for biologically-inspired gradient source localization and DNA sequence analysis," PhD thesis, Georgia Institute of Technology, Aug. 2006.

[10] N. Chakravarthy, A. Spanias, L. D. lasemidis, and K. Tsakalis, "Autoregressive modeling and feature analysis of DNA sequences," EURASIP JASP, vol. 1, pp. 13-28, 2004.

[11] King, B.R., Aburdene, M., Thompson, A. and Warres, Z. (2014) Application of Discrete Fourier Inter-Coefficient Difference for Assessing Genetic Sequence Similarity. EURASIP Journal on Bioinformatics and Systems Biology, 2014, 8.

[12] Tung Hoang, Changchuan Yin, HuiZheng, Chenglong YU, Rong Lucy He, Stephen S, T. Tay - A new method to cluster DNA sequences using Fourier power spectrum- Journal of Theoretical Biology- 372 (2015), 135-145.

[13] Ghosh, S., Pal, J. and Bhattacharya, D.K. (2014) Classification of Amino Acids of a Protein on the Basis of Fuzzy Set Theory. International Journal of Modern Sciences and Engineering Technology, 1, 30-35.

[14] Ghosh, S., Pal, J. S. Das and Bhattacharya, D.K (2015)-Biological and Theoretical Classifications of Amino Acids in Six Groups. International Journal of Computer Science and Software Engineering, 5, 695-698.

[15] Pal, J., Ghosh, S., Maji, B. and Bhattacharya, D.K. (2016) Use of FFT in Protein Sequence Comparison under Their Binary Representations. Computational Molecular Bioscience, 6, 33-40.

[16] D. Anastassiou, Frequency-domain analysis of bimolecular sequences, Bioinformatics, vol.16, no.4, pp. 1073-1081, 2000.

[17] Changchuan Yin and Stephen S. –T. Yau, Numerical representation of DNA sequences Based on Genetic Code Context and its applications in Periodicity Analysis Genomes- 978-1—1779-7/08/$25.00@2008 IEEE

[18] P. Argos, J.K.M.Rao and P.A.Hargrave, structural prediction of membrane bound proteins, Eur.J.Biochevol.128, pp. 565-575, 1982.

[19] D. E. Godsack and R. C. Chalifoux, Contribution of the free energy of mixing hydrophobic side chains to the stability of the tertiary structure, Journal of Theoretical Biology vol. 39, pp. 645-651, 1973.

View Full Article: