A Big Data Solution to Detect Conditional Functional Dependency Violations

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    The violation detection of conditional functional dependencies in distributed environment has been a research problem giving inspiration to many researchers recently. A very few solutions were given in the recent past to handle conditional functional dependencies. Unfortunately, these are inappropriate in real time big data applications. This article mainly focuses on the big data solution to such type of problems. The proposed IMRCFDHBD algorithm reduces elapsed time and provides scalability with minimum data shipment. The result proves that the algorithm outperforms the state-of-the-art techniques in the big data scenarios.



  • Keywords

    Big data; Conditional functional dependencies; Hadoop; Mapreduce;Violation detection.

  • References

      [1] Xia ZY & Ge Z (2010), MD5 research, Proceedings of the 2nd Inte- rnational Conference on Multimedia and Information Technology, 271-273, https://doi.org/10.1109/MMIT.2010.186

      [2] Lakshen GA, Vranes S & Janev V (2016), Big data and quality: A literature review, Proceedings of the 24th TELFOR , 802-805, http- s://doi.org/10.1109/TELFOR.2016.7818902

      [3] Batini C, Rula A, Scannapieco M & Viscusi G (2015), From data quality to big data quality. Journal of Database Management 26, 60-82.

      [4] Zhang D (2013), Inconsistencies in big data, Proceedings of the 12th IEEE International Conference on Cognitive Informatics and Cognitive Computing, 61-67, https://doi.org/10.1109/ICCICC.2013- .6622226

      [5] Agrawal S, Deb S, Naidu KVM & Rastogi R (2007), Efficient dete- ction of distributed constraint violations, Proceedings of the IEEE 23rdInternational Conference on Data Engineering, 1320-1324, htt- ps://doi.org/10.1109/ICDE.2007.369002

      [6] Gupta A & Widom J (1993), Local verication of global integrity constraints in distributed databases, Proceedings of the ACM SIG- MOD International Conference on Management of Data, Vol. 22, 49-58, https://doi.org/10.1145/170036.170048

      [7] Huyn N (1997), Maintaining global integrity constraints in distribu- ted databases. Constraints 2, 377-399, https://doi.org/10.1023/A:1- 009703814570

      [8] Fan W, Geerts F, Ma S & Mller H (2010), Detecting inconsistencie- s in distributed data, Proceedings of the International Conference on Data Engineering, 64-75, https://doi.org/10.1109/ICDE.2010.5- 447855

      [9] Ramalingam G & Reps TW (1993), A categorized bibliography on incremental computation, Proceedings of the 20th ACM SIGPLAN SIGACT Symposium on Principles of Programming Languages, 502-510, https://doi.org/10.1145/158511.158710

      [10] Gupta A & Mumick IS (1999), Materialized Views: Techniques, Implementations, and Applications, MIT Press, Cambridge, MA, USA, pp.141-338.

      [11] Bailey J, Dong G, Mohania M & Wang XS (1998), Incremental view maintenance by base relation tagging in distributed databases. Distributed and Parallel Databases 6 , 287-309, https://doi.org/1- 0.1023/A:1008683116381

      [12] Blakeley JA, Larson PA & Tompa FW (1986), Efficiently updating materialized views, Proceedings of the ACM SIGMOD Internation- al Conference on Management of Data, Vol. 15, 61-71, https://doi.org/10.1145/16856.16861

      [13] Gupta A, Mumick IS & Subrahmanian VS (1993), Maintaining vi- ews incrementally, Proceedings of the ACM SIGMOD Internation- al Conference on Management of Data, Vol. 22, 157-166, https://- doi.org/10.1145/170036.170066

      [14] Roussopoulos N (1991), An incremental access method for view cache: concept, algorithms, and cost analysis. ACM Transactions on Database Systems 16, 535-563, https://doi.org/10.1145/111197.111215

      [15] Kementsietsidis A, Neven F, Craen D & Vansummeren S (2008), Scalable multi-query optimization for exploratory queries over federated scientic databases, Proceedings of the VLDB endowment, Vol. 1, 16- 27, https://doi.org/10.14778/1453856.1453864

      [16] Kossman D (2000), The state of the art in distributed query proces- sing. ACM Computing Surveys(CSUR) 32, 422-469, https://doi.or- g/10.1145/371578.371598

      [17] Bernstein PA & Chiu DMW (1981), Using semi-joins to solve relational queries. Journal of the ACM 28 , 25-40, https://doi.org/- 10.1145/322234.322238

      [18] Mackert LF & Lohman GM (1986), R* optimizer validation and performance evaluation for distributed queries, Proceedings of the 12th International Conference on Very Large Data Bases,149-159.

      [19] DeHaan D & Tompa FW (2007), Optimal top-down join enumera- tion, Proceedings of the ACM SIGMOD International Conference on Management of Data, 785-796, https://doi.org/10.1145/124748- 0.1247567

      [20] Wang X, Burns RC, Terzis A & Deshpande A (2008), Network a- ware join processing in global-scale database federations, Procee- dings of the 24th International Conference on Data Engineering, 586-595, https://doi.org/10.1109/ICDE.2008.4497467

      [21] Frey PW, Goncalves R, Kersten ML & Teubner J (2010), A spin- ning join that does not get dizzy, Proceedings of the IEEE 30th In- ternational Conference on Distributed Computing Systems, 283- 292.

      [22] Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jo- nes EPC, Madden S, Stonebraker M, Zhang Y, Hugg J & Abadi DJ (2008), H-store: A high-performance, distributed main memory transaction processing system, Proceedings of the VLDB endowm- ent, Vol. 1, 1496-1499, https://doi.org/10.14778/1454159.1454211

      [23] Dean J & Ghemawat S (2008), MapReduce: Simplied data proces- sing on large clusters. Communications of the ACM 51, 107-113 , https://doi.org/10.1145/1327452.1327492

      [24] Nykiel T, Potamias M, Mishra C, Kollios G & Koudas N (2010), MRShare: Sharing across multiple queries in MapReduce, Procee- dings of the VLDB endowment, Vol. 3, 494-505.

      [25] Fan W, Li J, Tang N & Yu qa W (2014), Incremental Detection of Inconsistencies in Distributed Data. IEEE Transactions on Knowl- edge and Data Engineering 26, 1367-1383, https://doi.org/10.1109 /TKDE.2012.138

      [26] Imawan A, Putri FK, An S, Jeong HY & Kwon J (2015), Scalable extraction of timeline information from road traffic data using MapReduce, Proceedings of the IEEE International Conference on Data Science and Advanced Analytics, 1-8, https://doi.org/1- 0.1109/DSAA.2015.7344850

      [27] Ali M & Kumar J (2016), Implementation of Image Processing System using Handover Technique with Map Reduce Based on Big Data in the Cloud Environment. The International Arab Jour- nal of Information Technology 13, 326-331.

      [28] Somasekhar G & Karthikeyan K (2015), The Pre Big Data Match- ing Redundancy Avoidance Algorithm with Mapreduce. Indian Journal of Science and Technology 8, 1-7, http://dx.doi.org/10.1- 7485/ijst%2F2015%2Fv8i33%2F77477

      [29] Somasekhar G & Karthikeyan K (2017), Fast Matrix Multiplicati- on with Big Sparse Data. Cybernetics and Information Technolo- gies 17, 16-30, https://doi.org/10.1515/cait-2017-0002

      [30] Kolb L, Thor A & Rahm E (2012), Multipass Sorted Neighbourh- ood Blocking With MapReduce. Computer Science-Research and Development 27, 45-63.

      [31] Gao K, Wang Q & Xi L (2014), Reduct Algorithm based Executi- on Times Prediction in Knowledge Discovery Cloud Computing Environment.The International Arab Journal of Information Tech- nology 11, 268-275.




Article ID: 26641
DOI: 10.14419/ijet.v7i4.10.26641

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.