Scalable Execution of KNN Queries using Data Parallelism Approach

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    In recent years, real-time data-oriented applications such as sensor networks, telecommunications data management, network monitoring are required to process various continuous queries on unbounded data streams. A lot of work has been done to deal with the computational complications in constant processing of continuous queries on unbounded, continuous data stream. The K-nearest neighbor algorithm (KNN) is a well-known learning method used in a wide range of problem-solving domains e.g., network monitoring, data mining, and image processing etc. The efficient and scalable processing of multiple continuous queries on dynamic data items requires query indexing and data indexing. Query processing algorithms used on static databases are not well suited to handle dynamic continuous queries over high dimensional data sets.  It is better to build the index for queries which is finite rather than to build the index for data which is infinite. A divide-and-conquer approach is used for indexing and searching for K-nearest neighbors. The approach significantly will reduce the space complexity and will scale well with the increasing data size. The hybrid indexing approach using grid and a K-dimensional tree will reduce the space cost as well searching cost. The data parallelism will provide scalability of continuous queries over high-volume streams.

     

     



  • Keywords


    Data stream; Grid indexing; KNN; R-tree; Scalability

  • References


      [1] J. Park, B. Hong, and C. Ban, “A continuous query index for processing queries on RFID data stream,” Proceeding. of 13th IEEE International Conf erence on Embedded Real-Time Comput. Syst. Appl., (2007), pp. 138–145.

      [2] S. Prabhakar, Y. Xia, D. Kalashnikov, W. G. Aref, and S. Hambrusch, “Query indexing and velocity constrained indexing: Scalable techniques for continuous queries on moving objects,” IEEE Transactions on Comuters, vol. 51, no. 10, (Oct. 2002), pp. 1124–1140.

      [3] C. Bohm, B. C. Ooi, C. Plant, and Y. Yan, “Efficiently processing continuous k-NN queries on data streams,” in Proceeding of IEEE 23rd International Conference on Data Engineering., (2007), pp. 156– 165.

      [4] X. Yu, K. Q. Pu, and N. Koudas, “Monitoring k-nearest neighbour queries over moving objects,” in Proceeding of . 21st International Conference on Data Engineering, (2005), pp. 631–642.

      [5] D. V. Kalashnikov, S. Prabhakar, and S. E. Hambrusch, “Main memory evaluation of monitoring queries over moving objects,”Distributed Parallel Databases, vol. 15, (2004), pp. 117–132.

      [6] Ze Deng, Xiaoming Wu, Lizhen Wang, Xiaodao Chen, Rajiv Ranjan, Albert Zomaya, and Dan Chen, “Parallel Processing of Dynamic Continuous Queries over Streaming Data Flows”, IEEE Transac tions On Parallel And Distributed Systems, Vol. 26, No. 3, (March 2015) .pp. 834-846.

      [7] B. Gedik, K. -L. Wu, P. S. Yu, and L. Liu, “Processing moving queries over moving objects using motion-adaptive indexes,”IEEE Transaction on Knowledge and Data Engineering., vol. 18, no. 5, (June 2006), pp. 651–668.

      [8] Haojun Wang and Roger Zimmermann, “ Processing of Continuous Location-Based Range Queries on Moving Objects in Road Networks “, IEEE Transactions On Knowledge And Data Engineering, Vol. 23, No. 7, ( July 2011), pp. 1065-1078.

      [9] W. Choi, B. Moon, and S. Lee, “Adaptive cell-based index for mov ing objects,” Data and Knowledge Engineering, vol. 48, (2004, pp. 75–101.

      [10] F. Liu and K. A. Hua, “Moving query monitoring in spatial network environments,” Mobile Netw. Appl., vol. 17, pp. 234–254,2012.

      [11] S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez, “Indexing the positions of continuously moving objects,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, (2000), pp. 331–342.

      [12] J.T. Robinson, “The K-D-B-tree: A search structure for large multidimensional dynamic indexes,” in Proc. ACM Sigmod International. Conference on Management of Data, (1981), pp. 10–18.

      [13] D. V. Kalashnikov, S. Prabhakar, W. G. Aref, and S. E. Hambrusch, “Efficient evaluation of continuous range queries on movingobjects,” in Proc. 13th International Conference on Database Expert System Applications, (2002), pp. 1–10.

      [14] Ziqiang Yu, Yang Liu, Xiaohui Yu, and Ken Q. Pu, “Scalable Distributed Processing of K Nearest Neighbor Queries over Moving Objects”, IEEE Transactions On Knowledge and DataEngineering, Vol.27, No.5, ( May 2015), pp. 1383-1396.

      [15] L. Golab and M. T. € Ozsu, “ Issues in Data stream management ” ACM SIGMOD Rec., vol. 32, (2003), pp. 5–14.

      [16] Albert Bifet, Ricard Gavald, “Learning from Time-Changing Data with Adaptive Windowing” Proceedings of the Seventh SIAM International Conference on Data Mining, (2007)

      [17] Manisha B. Thombare, K. V. Metre, “Query Optimization and Execution of Dynamic Data Items in Network Aggregation Environment”, Elsevier , (2014) , pp. 1406 -1413.

      [18] M. Thombare, K. V. Metre, “ Aggregation Environment for Query Optimization in Network Monitoring”, International Journal on Recent and Innovation Trends in Computing and Communication Volume: 2,Issue: 6 (2014), pp. 1515 – 1518.

      [19] H. D. Gangurde, K. V. Metre, “Clustering based Feature Selection from High Dimensional Data”, International Journal on Recent and Innovation Trends in Computing and Communication , Volume: 3 , Issue: 6 , (2015), pp. 3556 – 3560.

      [20] Trupti Atre, K. V. Metre, “ MIRS: Text Based and Content Based Image Retrieval International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 3, Issue 4, (2014), pp. 579-584.

      [21] M. U. Kharat, R. P. Dahake, K. V. Metre, Feature Dimension Reduction for Content-Based Image Identification, Book Chapter (Clustering Techniques for Content-Based Feature Extraction From Image ) , IGI Global, (2018), pp. 100-121.

      [22] K. B. Deshmukh, M. U. Kharat, “Accelerating Smith-Waterman Alignment Based on GPU”, Journal of Advanced Research in Computer Science and Software Engineering”, Volume 5, Issue 5, (2015), pp. 1162-1164.

      [23] Cagri Balkesen and Nesime Tatbul, “Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams”, 8th International Workshop on Data Management for Sensor Networks (DMSN) (2011)


 

View

Download

Article ID: 28286
 
DOI: 10.14419/ijet.v7i4.19.28286




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.