Design and development of text extraction and retrieval using style of documents in web searching

S. Balan; P. Ponmuthuramalingam

doi:10.14419/ijet.v7i1.2.9038

Authors and Affiliations

S. Balan
P. Ponmuthuramalingam

About this article

DOI:

https://doi.org/10.14419/ijet.v7i1.2.9038

Received:

04-01-2018

Revised:

04-01-2018

Accepted:

04-01-2018

Published:

28-12-2017

Views:

163

Downloads:

6

Download PDF

Keywords:

Web Search, Text Extraction, Data Alignment, Data Retrieval.

Abstract

This research focuses on study and extraction of web pages and documents are returned from goggle search engine. The useful task of web is to exactly match the accurate information. That information are categorized into many ways such as manual, structured, semi-structured texts and images. Query Result Records (QRR’s) is used to extract the text information from the different type of documents. Data region is used to identify the actual segmentation step and the domain of documents contains suffix and prefix. Time compared to the existing pruning and other techniques are more efficient in manner. We analyze the different type of alignments in this paper and propose a new technique for alignment retrieval to find precision and recall evaluating the retrieval performance.

References

Bhosale, C (2015). Automatic Annotation of Query Results from Deep Web Database. International Journal of Engineering Sciences & Research Technology, 1(4), pp. 239-246.

Crescenzi, G. Mecca, and P. Merialdo (2003), “Road Runner: To-wards Automatic Data extraction from Large Web Sites,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 109-118, 2001 Web Conf. (APWeb), pp.406-417.

Hai He, Hongkun Zhao, Y. Yiyao Lu, Weiyi Meng (mar. 2013), Annotating Search Result Records from web databases, IEEE Trans-action on Knowledge and Data Engg., 25( 3), pp. 239-246.

Hammer, J. McHugh, and H. Garcia-Molina, (1997) “Semi struc-tured Data: The TSIMMIS Experience,” Proc .East-European Work-shop Advances in Databases and Information Systems (ADBIS), pp. 1-8.

http://db.cis.upenn.edu/DL/www8.pdf (accessed on 12th Nov 16)

View more references (11)

Jadhav, T., & Chobe, S. (2015). Data Extraction and Alignment of Search Results by Combining Tag Value Structure. IJETT, 2(2). Pp. 381-384.

Liu, W., Meng, X., & Meng, W. (2006). Vision-based web data rec-ords extraction. In Proc. 9th international workshop on the web and databases (pp. 20-25).

Lu.Y, H. He, H. Zhao, W. Meng, and C. Yu (2007), Annotating Structured Data of the Deep Web, Procedure IEEE 23rd Intl Confer-ence Data Eng. (ICDE). Pp. 1-18.

Manjula, R., & Chilambu chelvan, A. (2013). Hauling Templates from Web Pages Using Clustering Techniques. International Journal of Engineering Sciences & Emerging Technologies, 5(2), pp. 119-126.

Muneeswari, G. (2014). Agent based Authentication for Deep Web Data Extraction. International Journal of Innovative Research in In-formation Security (IJIRIS), 2(4), pp. 44-52.

Patel, D., & Thakkar, A. (2015). A Survey of Unsupervised tech-nique for web data extraction. International Journal of Computer Sci-ence, 6(2), pp. 1-5.

Shen, W., & Zou, X. (2015). An Algorithm on Web Article Auto-matic Extraction Based on DOM Structure. International Journal of Hybrid Information Technology, 8(3), 243-254. https://doi.org/10.14257/ijhit.2015.8.3.22.

Sriramoju, S. B. (2014). An Application for Annotating Web Search Results. Proc. International Journal of Innovative Research in Com-puter and Communication Engineering (An ISO 3297: 2007 Certi-fied Organization) Vol, 2. Pp. 3306-3312.

Stern, R., & Sagot, B. (2012, June). Population of a knowledge base for news metadata from unstructured text and web data. In Proceed-ings of the Joint Workshop on Automatic Knowledge Base Con-struction and Web-scale Knowledge Extraction Association for Computational Linguistics. pp. 35-40.

Thomas, S (2014). Clustering Based Annotation of Search Results. International Journal of Emerging Trends in Engineering and Devel-opment 4(3). Pp.123-130.

Yogam, V., & Uma maheswari (2014), K. Automatic Annotation Wrapper Generation and Mining Web Database Search Result. In-ternational Journal of Innovative Research in Science, Engineering and Technology, 3(3). Pp 10562-10569.

How to Cite

Balan, S., & Ponmuthuramalingam, P. (2017). Design and development of text extraction and retrieval using style of documents in web searching. International Journal of Engineering and Technology, 7(1.2), 130-134. https://doi.org/10.14419/ijet.v7i1.2.9038

Download Citation

Design and development of text extraction and retrieval using style of documents in web searching

Authors and Affiliations

About this article

DOI:

Received:

Revised:

Accepted:

Published:

Views:

Downloads:

Keywords:

Abstract

References

How to Cite

Related Articles

Downloads