Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Do not crawl in the dust: different urls with similar text
Proceedings of the 16th international conference on World Wide Web
CINDI Robot: an Intelligent Web Crawler Based on Multi-level Inspection
IDEAS '07 Proceedings of the 11th International Database Engineering and Applications Symposium
Architecture for effective personalised web search
International Journal of Computer Applications in Technology
ICETET '09 Proceedings of the 2009 Second International Conference on Emerging Trends in Engineering & Technology
Caching personalised and database-related dynamic web pages
International Journal of High Performance Computing and Networking
Hi-index | 0.00 |
Any existing search engine suffers the problem of redundancy in their search results. Detecting and eliminating such redundancy (near-duplicates) is one thrust area of research conducted widely by many search engine researchers. Provenance-based factors would improve the web search in view of providing beneficial quality content to the user. For users, many factors that affect personalisation may prove to be useful in determining the quality and trust in web documents. Also provenance information is helpful in filtering near duplicates from search results based on 6W factors. Hence this paper is aimed towards developing a web search system using provenance-based technique of near-duplicates detection and elimination. This system incorporates a personalised crawler (focused crawler) for computing author credentials which contributes to the trustworthiness of a web document. Finally, the results of the proposed system are compared with existing algorithms using a test bed of web documents.