A Review: Replication Strategies for Big Data in Cloud Environment

  • Authors

    • M. A. Fazlina
    • Rohaya Latip
    • Hamidah Ibrahim
    • Azizol Abdullah
    2018-12-09
    https://doi.org/10.14419/ijet.v7i4.31.23409
  • Big Data, Cloud Computing, Performance Metrics, Replication Strategies, Algorithms.
  • Big Data technology is emerging around the globe to provide better insight and decision making for every organization. As the nature of Big Data is providing variety and huge volume of data with complex data computation, cloud environment is the best choice to resolve storage issues. However, the challenge remain in this technology is data availability due to heterogeneity of Big Data systems. Data must be always accessible and available for user regardless of time. The most essential option to satisfy this desire is providing best replication strategies which able to afford business continuity without interruption. Hence, this paper delivers better perceptions on the data replications strategies for Big Data systems in cloud environment. Critical review concerning replication strategies is discussed and presented with imperative details from numerous researchers. Additionally, this work contributes thorough discussion on advantages and gaps for each study. This study also explores algorithms and performance metrics that has been improved by researchers. The methodology used to conduct this paper was using qualitative research approach. Ultimately, this paper would be helpful for future researchers in understanding and selecting the best strategy to fit their research scope and goals.

     

     

  • References

    1. [1] J. Zhu and A. Wang,†Data Modeling for Big Data. Manager, Software Engineering,†CA Technologies,2012.

      [2] Q. Xia, W. Liang and Z. Xu, "QoS-aware data replications and placements for query evaluation of Big Data analytics," IEEE International Conference on Communications (ICC), Paris, 2017, pp. 1-7.

      [3] M. Berry, “Big Data: What it means to IT Managers on The Front Lines,†2016.

      [4] M.K.Hussein,M.HMousa,“Alight-weightDataReplicationforCloudDataCentersEnvironment,â€Int.J.InnovativeRes.Comput.Commun.Eng.2(1)(2014)2392–2400.

      [5] C. Yang, Q. Huang, Z. Li, K. Liu and F. Hu,†Big Data and cloud computing: innovation opportunities and challenges,’ International Journal of Digital Earth Vol. 10, Iss. 1,2017.

      [6] Q. Zhao, C. Xiong, X. Zhao, C. Yu and J. Xiao, "A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud," 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, 2015, pp. 928-934.

      [7] N. Mansouri, “Adaptive Data Replication Strategy in Cloud Computing for Performance Improvement,†Front. Computer. Sci. 10 (5) (2016) 925–935.

      [8] V. Chang, “Towards a Big Data System Disaster Recovery in a Private Cloud,†Ad Hoc Network 35, 65-82, 2015.

      [9] B. E. Thapa, “Big Data in Government: A social science perspectiveâ€, 2013.

      [10] Z. Lv, H. Song, P. Basanta-Val, A. Steed and M. Jo, "Next-Generation Big Data Analytics: State of the Art, Challenges, and Future Research Topics," in IEEE Transactions on Industrial Informatics, vol. 13, no. 4, pp. 1891-1899, Aug. 2017.

      [11] X. Zheng, Z. Cai, "Real-Time Big Data Delivery in Wireless Networks: A Case Study on Video Delivery", Industrial Informatics IEEE Transactions on, vol. 13, pp. 2048-2057, 2017, ISSN 1551-3203.

      [12] L. Singh and J. Malhotra, “A Survey on Data Placement Strategies for Cloud based Scientific Workflows,†International Journal of Computer Applications 141(6):30-33, May 2016.

      [13] R. Li, Y. Hu and P. P. C. Lee, "Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems," 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Rio de Janeiro, 2015, pp. 148-159.

      [14] B.A.Milani,N.J.Navimipour,“AComprehensiveReviewofTheDataReplicationTechniquesinTheCloudEnvironments:MajorTrendsandFutureDirections,â€J.Netw.Comput.Appl.64(2016)229–238.

      [15] F. Xie, J. Yan and J. Shen, "Towards Cost Reduction in Cloud-Based Workflow Management through Data Replication," 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD), Shanghai, 2017, pp. 94-99.

      [16] N.K.Gill,S.Singh,Adynamic,“Cost-aware,optimizeddatareplicationstrategyforheterogeneousClouddatacenters,FutureGener.Comput.Syst.65(2016)10–32.

      [17] S.Q. Long, Z.Y. Long, C. Wei,†MORM: A Multi-Objective Optimized Replication Management Strategy for Cloud Storage Cluster,†J Syst Arch, 2013.

      [18] A. Cidon, R. Stutsman, S. Rumble, S. Katti, J. Ousterhout, and M. Rosenblum, “MinCopysets: Derandomizing Replication in Cloud Storage,†Paper presented in 10th USENIX Symposium on Network System Design and Implementation (NSDI), 2013.

      [19] D.Boru,D.Kliazovich,F.Granelli,P.Bouvry,A.Y.Zomaya,“Energy-EfficientDataReplicationinCloud ComputingDatacenters,â€ClusterComput. 18(2015) 385–402.

      [20] W. Li, Y. Yang and D. Yuan, "Ensuring Cloud Data Reliability with Minimum Replication by Proactive Replica Checking," in IEEE Transactions on Computers, vol. 65, no. 5, pp. 1494-1506, May 1 2016.

      [21] N. Mansouri, M. Kuchaki Rafsanjani, M.M. Javidi , “DPRS: A Dynamic Popularity Aware Replication Strategy with Parallel Download Scheme in Cloud Environments,†Elsevier Simulation Modelling Practice and Theory 77 (2017) 177-196.

      [22] J. Wang, W. Huafeng, and W. Ruijun, "A new reliability model in replication-based Big Data storage systems," Journal of Parallel and Distributed Computing 108 (2017): 14-27.

      [23] P. Carns, K. Harms, J. Jenkins, M. Mubarak, R. Ross and C. Carothers, "Impact of data placement on resilience in large-scale object storage systems," 32nd Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, 2016, pp. 1-12.

      [24] G. J. Akash, O. T. Lee, S. D. M. Kumar, P. Chandran and A. Cuzzocrea, "RAPID: A Fast Data Update Protocol in Erasure Coded Storage Systems for Big Data," 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Madrid, 2017, pp. 890-897.

      [25] X. Xu, X. Zhao, F. Ruan, J. Zhang, W. Tian, W. Dou and A. X. Liu,â€Data Placement for Privacy-Aware Applications over Big Data in Hybrid Clouds,†Security and Communication Networks, 2017.

      [26] K.AshwinKumar,A.Quamar,A.Deshpande,S.Khuller,â€SWORD:Workload-awareDataPlacementandReplica SelectionforCloudDataManagementSystems,â€VLDBJ.23(6)(2014)845–870.

      [27] B. Wenjie, M. Cai, M. Liu, and G. Li, “A Big Data clustering algorithm for mitigating the risk of customer churn,†IEEE Trans. Ind. Informat., vol. 12, no. 3, pp. 1270–1281, Jun. 2016.

      [28] Madni, S.H.H., Latiff, M.S.A., Coulibaly, Y., “Recent advancements in resource allocation techniques for cloud computing environment: a systematic review,†Clust. Comput. 1, 45 (2016).

      [29] M. S. Almhanna, "Minimizing replica idle time," 2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT), Baghdad, 2017, pp. 128-131.

      [30] Y. Sun, H. Song, A. J. Jara, and R. Bie, “Internet of things and Big Data analytics for smart and connected communities,†IEEE Access, vol. 4, pp. 766–773, Mar. 2016.

      [31] S. Souravlas and A. Sifaleras, "Binary-Tree Based Estimation of File Requests for Efficient Data Replication," in IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, pp. 1839-1852, July 1 2017.

      [32] Y. Sun, H. Song, A. J. Jara and R. Bie, "Internet of Things and Big Data Analytics for Smart and Connected Communities," in IEEE Access, vol. 4, pp. 766-773, 2016.

      [33] J. Zhou, W. Xie, D. Dai and Y. Chen, "Pattern-Directed Replication Scheme for Heterogeneous Object-Based Storage," 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Madrid, 2017, pp. 645-648.

      [34] A. L’Heureux, K. Grolinger, H. F. Elyamany and M. A. M. Capretz, "Machine Learning with Big Data: Challenges and Approaches," in IEEE Access, vol. 5, pp. 7776-7797, 2017.

      [35] H. Wang, Z. Xu and W. Pedrycz,†An overview on the roles of fuzzy set techniques in big data processing: Trends, challenges and opportunities,†Knowledge-Based Systems, Volume 118, 2017, Pages 15-30.

  • Downloads

  • How to Cite

    A. Fazlina, M., Latip, R., Ibrahim, H., & Abdullah, A. (2018). A Review: Replication Strategies for Big Data in Cloud Environment. International Journal of Engineering & Technology, 7(4.31), 357-362. https://doi.org/10.14419/ijet.v7i4.31.23409