Genomics big data hybrid depositories architecture to unlock precision medicine: a conceptual framework







Architecture Design of Hybrid Depositories, Data Driven Genomics, Personalized Medicine Framework.


As the genome sequencing cost becomes more affordable, genomics studies are extensively carried out to empower the ultimate healthcare goal which is the precision medicine. By tailoring each individual medical treatment through precision medicine, it will potentially lead to nearly zero occurrence of the drugs side effects and treatment complications. Unfortunately, the complexity of the genomics data has been one of the bottlenecks that deter the advances of healthcare practices towards precision medicine. Therefore, based on the extensive literature review on the data driven genomics challenges towards precision medicine, this paper proposes two new contributions to the field; the conceptual framework for the genomics-based precision medicine and the architectural design for the development of hybrid depositories as the initial step to bridge the gap towards precision medicine. The genomics big data hybrid depositories architecture design is composed of few components; storage layer and service layer interconnected system such as visualization, data protection modeling, event processing engine and decision support, to carry out their purpose of merging the genomics data with the healthcare data.



[1] J. Jameson and D. Longo, “Precision medicine—personalized, problematic, and promising,†Obstet. Gynecol. Surv., vol. 70, no. 10, pp. 612–614, 2015.

[2] E. A. Ashley, “The precision medicine initiative: a new national effort,†Jama, vol. 313, no. 21, pp. 2119–2120, 2015.

[3] I. Ezkurdia et al., “Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes,†Hum. Mol. Genet., vol. 23, no. 22, pp. 5866–5878, 2014.

[4] M. Grossglauser and H. Saner, “Data-driven healthcare: from patterns to actions,†Eur. J. Prev. Cardiol., vol. 21, no. 2_suppl, pp. 14–17, Nov. 2014.

[5] G. Mendel, “Mendel’s Journey from Peas to Petabytes,†Biol. Imagin. Innov. Biosci., p. 121, 2014.

[6] A. O’Driscoll, J. Daugelaite, and R. Sleator, “‘Big data’, Hadoop and cloud computing in genomics,†J. Biomed. Inform., vol. 46, no. 5, pp. 774–781, 2013.

[7] T. A. Peterson, E. Doughty, and M. G. Kann, “Towards precision medicine: advances in computational approaches for the analysis of human variants,†J. Mol. Biol., vol. 425, no. 21, pp. 4047–4063, 2013.

[8] S. Zhao et al., “Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing,†BMC Genomics, vol. 14, no. 1, p. 425, 2013.

[9] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,†Mob. Networks Appl., vol. 19, no. 2, pp. 171–209, 2014.

[10] Z. D. Stephens et al., “Big Data: Astronomical or Genomical?,†PLoS Biol., vol. 13, no. 7, p. e1002195, Jul. 2015.

[11] M. Viceconti, P. Hunter, and R. Hose, “Big data, big knowledge: big data for personalized healthcare,†IEEE J. Biomed. Heal. Informatics, vol. 19, no. 4, pp. 1209–1215, 2015.

[12] J. Andreu-Perez, C. C. Y. Poon, R. D. Merrifield, S. T. C. Wong, and G.-Z. Yang, “Big data for health,†IEEE J. Biomed. Heal. informatics, vol. 19, no. 4, pp. 1193–1208, 2015.

[13] F. S. Collins and V. A. McKusick, “Implications of the Human Genome Project for medical science,†Jama, vol. 285, no. 5, pp. 540–544, 2001.

[14] K. Offit, “Personalized medicine: new genomics, old lessons,†Hum. Genet., vol. 130, no. 1, pp. 3–14, 2011.

[15] P. Muir, S. Li, S. Lou, and D. Wang, “The real cost of sequencing: scaling computation to keep pace with data generation,†Genome, vol. 17, no. 1, p. 53, 2016.

[16] M. H.-Y. Fritz, R. Leinonen, G. Cochrane, and E. Birney, “Efficient storage of high throughput DNA sequencing data using reference-based compression,†Genome Res., vol. 21, no. 5, pp. 734–740, 2011.

[17] N. Khan et al., “Big data: survey, technologies, opportunities, and challenges,†Sci. World J., vol. 2014, 2014.

[18] N. S. Mauthner and O. Parry, “Open Access Digital Data Sharing: Principles, Policies and Practices☆,†Soc. Epistemol., vol. 27, no. 1, pp. 47–67, 2013.

[19] J. L. Jennings and T. J. Hudson, “Abstract 130: International Cancer Genome Consortium (ICGC),†Cancer Res., vol. 76, p. 130, 2016.

[20] V. Marx, “Biology: The big challenges of big data,†Nature, p. 255, 2013.

[21] E. S. Dove, Y. Joly, and A. Tassé, “Genomic cloud computing: legal and ethical points to consider,†Eur. J. Hum. Genet., vol. 23, no. 10, pp. 1271–1278, 2015.

[22] S. Kaisler, F. Armour, J. A. Espinosa, and W. Money, “Big data: Issues and challenges moving forward,†in System Sciences (HICSS), 2013 46th Hawaii International Conference on, 2013, pp. 995–1004.

[23] N. Levin, R. M. Salek, and C. Steinbeck, “From Databases to Big Data,†Metab. Phenotyping Pers. Public Healthc., p. 317, 2016.

[24] S. Choudhury, J. R. Fishman, M. L. McGowan, and E. T. Juengst, “Big data, open science and the brain: lessons learned from genomics,†Front. Hum. Neurosci., vol. 8, 2014.

[25] D. Kim, S. Song, and B.-Y. Choi, “Introduction,†in Data Deduplication for Data Optimization for Storage and Network Systems, Springer, 2017, pp. 3–21.

[26] D. Kim, S. Song, and B.-Y. Choi, “Existing Deduplication Techniques,†in Data Deduplication for Data Optimization for Storage and Network Systems, Springer, 2017, pp. 23–76.

[27] H. H. Do, J. Jansson, K. Sadakane, and W.-K. Sung, “Fast relative Lempel–Ziv self-index for similar sequences,†Theor. Comput. Sci., vol. 532, pp. 14–30, 2014.

[28] S. Deorowicz, A. Danek, and M. Niemiec, “GDC 2: Compression of large collections of genomes.,†Sci. Rep., vol. 5, p. 11565, Jun. 2015.

[29] W. Christopher and M. Simon, “Review on Genomics APIs,†Comput. Struct. Biotechnol. J., 2016.

[30] E. Wang, N. Zaman, S. Mcgee, J.-S. Milanese, A. Masoudi-Nejad, and M. O’Connor-McCourt, “Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data,†in Seminars in cancer biology, 2015, vol. 30, pp. 4–12.

[31] N. Tung, C. Battelli, B. Allen, R. Kaldate, and S. Bhatnagar, “Frequency of mutations in individuals with breast cancer referred for BRCA1 and BRCA2 testing using nextâ€generation sequencing with a 25â€gene panel,†Cancer, vol. 121, no. 1, pp. 25–33, 2015.

[32] T. Cooke, J. Reeves, A. Lanigan, and P. Stanton, “HER2 as a prognostic and predictive marker for breast cancer,†Ann. Oncol., pp. 23–28, 2001.

[33] M. West, G. S. Ginsburg, A. T. Huang, and J. R. Nevins, “Embracing the complexity of genomic data for personalized medicine,†Genome Res., vol. 16, no. 5, pp. 559–566, 2006.

[34] L. Chin, W. C. Hahn, G. Getz, and M. Meyerson, “Making sense of cancer genomic data,†Genes Dev., vol. 25, no. 6, pp. 534–555, 2011.

[35] J. G. Dunn and J. S. Weissman, “Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data,†BMC Genomics, vol. 17, no. 1, p. 958, 2016.

[36] D. C. Koboldt, K. M. Steinberg, D. E. Larson, R. K. Wilson, and E. R. Mardis, “The next-generation sequencing revolution and its impact on genomics,†Cell, vol. 155, no. 1, pp. 27–38, 2013.

[37] C. Castaneda et al., “Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine,†J. Clin. Bioinforma., vol. 5, no. 1, p. 4, 2015.

[38] L. Schriml, C. Arze, S. Nadendla, and Y. Chang, “Disease Ontology: a backbone for disease semantic integration,†, vol. 40, no. D1, pp. 910–946, 2011.

[39] D. Gomez-Cabrero et al., “Data integration in the era of omics: current and future challenges,†BMC Syst. Biol., vol. 8, no. 2, p. I1, 2014.

[40] G. O. Consortium, “Expansion of the Gene Ontology knowledgebase and resources,†Nucleic Acids Res., vol. 45, no. D1, pp. 331–338, 2017.

[41] M. Subhani, A. Anjum, and A. Koop, “Clinical and genomics data integration using meta-dimensional approach,†Proc. 9th, pp. 416–421, 2016.

[42] B. Louie, P. Mork, F. Martin-Sanchez, and A. Halevy, “Data integration and genomic medicine,†J. Biomed. Inform., vol. 40, no. 1, pp. 5–16, 2007.

[43] P. Appleby, “Linking Genomic Data with Phenotypes Derived from Electronic Health Records.,†Int. J. Popul. Data Sci., vol. 1, no. 1, 2017.

[44] M. D. Ritchie, M. De Andrade, and H. Kuivaniemi, “The foundation of precision medicine: integration of electronic health records with genomics through basic, clinical, and translational research,†Front. Genet., vol. 6, 2015.

[45] P. Khatri and S. Drăghici, “Ontological analysis of gene expression data: current tools, limitations, and open problems,†Bioinformatics, vol. 21, no. 18, pp. 3587–3595, 2005.

[46] S. Palaniappan and N. Y. Huey, “A tool for healthcare information integration,†J. ICT, vol. 5, pp. 29–44, 2006.

[47] M. Dugas, A. Meidt, P. Neuhaus, M. Storck, and J. Varghese, “ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository.,†BMC Med. Res. Methodol., vol. 16, p. 65, 2016.

[48] J. Marés et al., “p-medicine: A medical informatics platform for integrated large scale heterogeneous patient data,†in AMIA Annual Symposium Proceedings, 2014, vol. 2014, p. 872.

[49] F. Schera, G. Weiler, E. Neri, S. Kiefer, and N. Graf, “The p-medicine portal—a collaboration platform for research in personalised medicine,†Ecancermedicalscience, vol. 8, 2014.

[50] A. Alyass, M. Turcotte, and D. Meyre, “From big data analysis to personalized medicine for all: challenges and opportunities,†BMC Med. Genomics, vol. 8, no. 1, p. 33, 2015.

[51] F. Cheng, J. Zhao, and Z. Zhao, “Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes,†Brief. Bioinform., vol. 17, no. 4, pp. 642–656, Jul. 2016.

[52] J. Howison and J. Bullard, “Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature,†J. Assoc. Inf. Sci. Technol., vol. 67, no. 9, pp. 2137–2155, 2016.

[53] S. Goodwin, J. D. McPherson, and W. R. McCombie, “Coming of age: ten years of next-generation sequencing technologies,†Nat. Rev. Genet., vol. 17, no. 6, pp. 333–351, 2016.

[54] M.-A. Madoui et al., “Genome assembly using Nanopore-guided long and error-free DNA reads,†BMC Genomics, vol. 16, no. 1, p. 327, 2015.

[55] T. Madden, “The BLAST sequence analysis tool,†2013.

[56] R. Wilton, T. Budavari, B. Langmead, S. J. Wheelan, S. L. Salzberg, and A. S. Szalay, “Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space,†PeerJ, vol. 3, p. e808, 2015.

[57] F. E. Faisal, L. Meng, J. Crawford, and T. Milenković, “The post-genomic era of biological network alignment,†EURASIP J. Bioinforma. Syst. Biol., vol. 2015, no. 1, p. 3, 2015.

[58] R. Margolis, L. Derr, M. Dunn, and M. Huerta, “The National Institutes of Health’s Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data,†J. Am. Med. Informatics Assoc., vol. 21, no. 6, pp. 957–958, 2014.

[59] T. Barreto, A. Mand, M. Spielberg, D. MacKenzie, and S. Ghods, “Managing updates at clients used by a user to access a cloud-based collaboration service.†Google Patents, 21-Apr-2015.

[60] T. Takai-Igarashi et al., “Security controls in an integrated Biobank to protect privacy in data sharing: rationale and study design,†BMC Med. Inform. Decis. Mak., vol. 17, no. 1, p. 100, 2017.

[61] E. S. Dove, “Biobanks, Data Sharing, and the Drive for a Global Privacy Governance Framework,†J. Law, Med. Ethics, vol. 43, no. 4, 2015.

[62] F. Carrasco-Ramiro, R. Peiró-Pastor, and B. Aguado, “Human genomics projects and precision medicine,†Gene Ther., vol. 24, no. 9, p. 551, 2017.

[63] T. Schultz, “Turning healthcare challenges into big data opportunities: A useâ€case review across the pharmaceutical development lifecycle,†Bull. Assoc. Inf. Sci. Technol., vol. 39, no. 5, pp. 34–40, 2013.

[64] J. Luo, M. Wu, D. Gopukumar, and Y. Zhao, “Big data application in biomedical research and health care: A literature review,†Biomed. Inform. Insights, vol. 8, p. 1, 2016.

[65] A. Alzu’bi, L. Zhou, and V. Watzlaf, “Personal genomic information management and personalized medicine: challenges, current solutions, and roles of HIM professionals.,†Perspect. Heal. Inf. Manag., vol. 11, no. Spring, p. 1c, 2014.

[66] M. Beck, V. Haupt, J. Roy, J. Moennich, and R. Jäkel, Genecloud: Secure cloud computing for biomedical research. Springer, Cham., 2014.

[67] M. D. Assunção, R. N. Calheiros, S. Bianchi, M. A. S. Netto, and R. Buyya, “Big Data computing and clouds: Trends and future directions,†J. Parallel Distrib. Comput., vol. 79, pp. 3–15, 2015.

[68] A. P. Heath et al., “Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets,†J. Am. Med. Informatics Assoc., vol. 21, no. 6, pp. 969–975, Nov. 2014.

[69] S. Datta, K. Bettinger, and M. Snyder, “Practical Guidelines for Secure Cloud Computing using Genomic Data,†bioRxiv, p. 34876, 2015.

[70] Q. Jiang, M. K. Khan, X. Lu, J. Ma, and D. He, “A privacy preserving three-factor authentication protocol for e-Health clouds,†J. Supercomput., vol. 72, no. 10, pp. 3826–3849, 2016.

[71] A. Park et al., “The Blockchain for Personalized Medicine,†2017.

[72] Z. Shae and J. J. P. Tsai, “On the Design of a Blockchain Platform for Clinical Trial and Precision Medicine,†in Distributed Computing Systems (ICDCS), 2017 IEEE 37th International Conference on, 2017, pp. 1972–1980.

[73] D. Milius et al., “The International Cancer Genome Consortium’s evolving data-protection policies,†Nat. Biotechnol., vol. 32, no. 6, pp. 519–523, 2014.

[74] R. C. Green, D. Lautenbach, and A. L. McGuire, “GINA, genetic discrimination, and genomic medicine,†N. Engl. J. Med., vol. 372, no. 5, pp. 397–399, 2015.

[75] C. Auffray et al., “Making sense of big data in health research: towards an EU action plan,†Genome Med., vol. 8, no. 1, p. 71, 2016.

[76] U. H. Mohamad, M. T. Ijab, and R. A. Kadir, “Bridging the Gap in Personalised Medicine Through Data Driven Genomics,†in International Visual Informatics Conference, 2017, pp. 88–99.

[77] A. Shachak, K. Shuval, and S. Fine, “Barriers and enablers to the acceptance of bioinformatics tools: a qualitative study,†J. Med. Libr. Assoc. JMLA, vol. 95, no. 4, p. 454, 2007.

[78] L. Samuel, “Drug dosing goes digital with new algorithm,†Stat, 2016. [Online]. Available: [Accessed: 19-Jan-2018].

[79] L. Wang, R. Ranjan, J. Kolodziej, A. Y. Zomaya, and L. Alem, “Software Tools and Techniques for Big Data Computing in Healthcare Clouds.,†Futur. Gener. Comp. Syst., vol. 43, pp. 38–39, 2015.

[80] S. Wilson et al., “Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API,†Cancer Res., vol. 77, no. 21, pp. e15–e18, 2017.

[81] I. V Hinkson, T. M. Davidsen, J. D. Klemm, I. Chandramouliswaran, A. R. Kerlavage, and W. A. Kibbe, “A Comprehensive Infrastructure for Big Data in Cancer Research: Accelerating Cancer Research and Precision Medicine,†Frontiers in Cell and Developmental Biology, vol. 5. p. 83, 2017.

[82] A. Bisnajak, “The Bio-Nespresso Project: The design of a small-scale manufacturing unit for personalized medicine production,†2018.

[83] A. B. of Directors, “Laboratory and clinical genomic data sharing is crucial to improving genetic health care: a position statement of the American College of Medical Genetics,†Genet. Med., 2017.

[84] A. C. Resnick et al., “Abstract LB-008: The Pediatric Brain Tumor Atlas: building an integrated, multi-platform data-rich ecosystem for collaborative discovery in the cloud.†AACR, 2017.

[85] E. R. Hsu, J. D. Klemm, A. R. Kerlavage, D. Kusnezov, and W. A. Kibbe, “Cancer Moonshot Data and Technology Team: Enabling a National Learning Healthcare System for Cancer to Unleash the Power of Data,†Clin. Pharmacol. Ther., vol. 101, no. 5, pp. 613–615, 2017.

[86] A. Palmisano, Y. Zhao, M.-C. Li, E. C. Polley, and R. M. Simon, “OpenGeneMed: a portable, flexible and customizable informatics hub for the coordination of next-generation sequencing studies in support of precision medicine trials,†Brief. Bioinform., vol. 18, no. 5, pp. 723–734, 2016.

[87] D. R. Leff and G.-Z. Yang, “Big data for precision medicine,†Engineering, vol. 1, no. 3, pp. 277–279, 2015.

[88] K. Lauter, A. López-Alt, and M. Naehrig, “Private Computation on Encrypted Genomic Data.,†in International Conference on Cryptology and Information Security in Latin America, 2014, pp. 3–27.

[89] J. D. Tenenbaum et al., “An informatics research agenda to support precision medicine: seven key areas,†J. Am. Med. Informatics Assoc., vol. 23, no. 4, pp. 791–795, 2016.

[90] D. Pérez-Rey et al., “ONTOFUSION: Ontology-based integration of genomic and clinical databases,†Comput. Biol. Med., vol. 36, no. 7–8, pp. 712–730, 2006.

[91] N. R. Sperber et al., “Challenges and strategies for implementing genomic services in diverse settings: experiences from the Implementing GeNomics In pracTicE (IGNITE) network.,†BMC Med. Genomics, vol. 10, no. 1, p. 35, May 2017.

[92] B. M. Welch, K. Eilbeck, G. Del Fiol, L. J. Meyer, and K. Kawamoto, “Technical desiderata for the integration of genomic data with clinical decision support,†J. biomed. info, vol. 51, pp. 3–7, 2014.

[93] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, “Data mining with big data,†IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, 2014.

[94] H. Chang and M. Choi, “Big data and healthcare: building an augmented world,†Healthc. Inform. Res., vol. 22, no. 3, pp. 153–155, 2016.

[95] N. V Chawla and D. A. Davis, “Bringing big data to personalized healthcare: a patient-centered framework,†J. Gen. Intern. Med., vol. 28, no. 3, pp. 660–665, 2013.

[96] C. W. Tsao and R. S. Vasan, “Cohort Profile: The Framingham Heart Study (FHS): overview of milestones in cardiovascular epidemiology,†Int. J. Epidemiol., vol. 44, no. 6, pp. 1800–1813, 2015.

[97] T. Gordon, W. P. Castelli, M. C. Hjortland, W. B. Kannel, and T. R. Dawber, “High density lipoprotein as a protective factor against coronary heart disease: the Framingham Study,†Am. J. Med., vol. 62, no. 5, pp. 707–714, 1977.

[98] I. S. Kohane, “Ten things we have to do to achieve precision medicine,†Science (80-. )., vol. 349, no. 6243, pp. 37–38, 2015.

[99] M. J. Van De Vijver et al., “No TitleA gene-expression signature as a predictor of survival in breast cance,†N. Engl. J. Med., vol. 347, no. 25, pp. 1999–2009, 2002.

[100] I. Kotenko, O. Polubelova, A. Chechulin, and I. Saenko, “Design and implementation of a hybrid ontological-relational data repository for siem systems,†Futur. internet, vol. 5, no. 3, pp. 355–375, 2013.

[101] H. Garcia-Molina, Database systems: the complete book. Pearson Education India, 2008.

[102] D. Marco, “Building and managing the meta data repository,†A full lifecycle Guid., 2000.

[103] J. W. Smoller et al., “An eMERGE clinical center at partners personalized medicine,†J. Pers. Med., vol. 6, no. 1, p. 5, 2016.

[104] M. D. Ritchie et al., “Electronic medical records and genomics (eMERGE) network exploration in cataract: several new potential susceptibility loci,†Mol. Vis., vol. 20, p. 1281, 2014.

[105] M. I. Babar, M. Jehanzeb, M. Ghazali, D. N. A. Jawawi, F. Sher, and S. A. K. Ghayyur, “Big data survey in healthcare and a proposal for intelligent data diagnosis framework,†in 2nd IEEE International Conference on Computer and Communications (ICCC), 2016, pp. 7–12.

[106] A. V Fedorchenko, I. V Kotenko, E. V Doynikova, and A. A. Chechulin, “The ontological approach application for construction of the hybrid security repository,†in Soft Computing and Measurements (SCM), 2017 XX IEEE International Conference on, 2017, pp. 525–528.

View Full Article: