DeDuSERP: De-duplication in search engine result page

  • Authors

    • Naresh Sharma
    • Priti Dimri
    2018-03-19
    https://doi.org/10.14419/ijet.v7i2.8.10475
  • Search Engine, SERP, inventiveness, DeDuSERP (De-duplication in search engine result page).
  • Web offers a new way of service provision by arranging different resources over the web. The most critical and prominent is web searches. The purpose of this research is to identify a subtype of De-Duplication. DeDuSERP is de-duplication in search engine result page. It restricts the showcasing of urls with duplicate or similar data and hence enhances the search result experience of any client. By duplicate results we mean different links containing the same content or information. To solve this problem, we have designed a filter between Search engine result page and indexed-ranked pages which we get from the search engine in response to the query of the searcher. This filter eliminates the duplicate links idiosyncratically and displays the unique results on the SERP for the searcher. We have performed the string to string comparison of web pages and if the content is 90% similar then we adjudge them as duplicates and then check their inventiveness of these duplicate links on the basis of timestamp. By this we mean then the web page crawled earlier is original. The process of comparison and timestamp matching is done using an open source apache API Commons IO 2.4. 

  • References

    1. [1] De-duplication in Search Results US20150161267A1. https://www.google.ch/patents/US20150161267

      [2] S. Brin, L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems. 30. . 10.1016/j.comnet.2012.10.007. http://infolab.stanford.edu/~backrub/google.html

      [3] Jin Li, Yan Kit Li, Xiaofeng Chen, Lee, P.P.C., W. Lou, "Aybrid Cloud Approach for Secure Authorized Deduplication," in Parallel and Distributed Systems, IEEE Transactions on , vol.26, no.5, pp.1206- 1216, May 1 2015

      [4] L. Aronovich, R. Asher, E. Bachmat, H. Bitner, M. Hirsch, and S. T. Klein. “The design of a similarity based deduplication system†Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference (SYSTOR '09). ACM, New York, NY, USA, Article 6, 14 pages.

      [5] Stanek, Jan, et al. "A secure data deduplication scheme for cloud storage." Financial Cryptography and Data Security. Springer Berlin Heidelberg, 2014. 99-118

      [6] L. Page, S. Brin, R. Motwani, and T. Winograd. The Page Rank Citation ranking: Bringing order to the web. Technical report, Stanford digital library Technologies Project, Stanford University, Stanford, CA, USA, 1998.

      [7] Incredible growth of internet users. https://thenextweb.com/insider/2017/03/06/the- incredible-

      growth-of-the-internet-over-the-past-five- years-explained-in-detail.

      [8] N. Sharma, A. Mishra, P. Garg. Nap: Improving The Quality Of Search By Deduplicating The Search Results. International Journal of Engineering Applied Sciences and Technology, 2016 Vol. 1, Issue 6, ISSN No. 2455-2143, Pages 141-144 Published Online April - May 2016 in IJEAST (http://www.ijeast.com)

      [9] A. Murgai, N. Sharma,"Page Rank Algorithm Expressed in Terms of Link Distance and a Modified Procedure of Page Rank Calculation",3rd International Conference on Computer Modeling and Simulation (ICCMS 2011), 2011, ISBN: 978-1-4244-9243-5, pp. 586- 588

      [10] T.Padmapriya and V.Saminadan, “Utility based Vertical Handoff Decision Model for LTE-A networksâ€, International Journal of Computer Science and Information Security, ISSN 1947-5500, vol.14, no.11, November 2016.

      [11] S.V.Manikanthan and K.srividhya "An Android based secure access control using ARM and cloud computing", Published in: Electronics and Communication Systems (ICECS), 2015 2nd International Conference on 26-27 Feb. 2015,Publisher: IEEE,DOI: 10.1109/ECS.2015.7124833.

  • Downloads

  • How to Cite

    Sharma, N., & Dimri, P. (2018). DeDuSERP: De-duplication in search engine result page. International Journal of Engineering & Technology, 7(2.8), 427-431. https://doi.org/10.14419/ijet.v7i2.8.10475