A framework for utilising usage trends in the crawling and indexing process of search engines

Authors:
Neelam Duhan;A. K. Sharma
Affiliations:
Department of Computer Engineering, YMCA University of Science and Technology, Zakir Nagar, Sector-6, Faridabad, India.;Department of Computer Engineering, YMCA University of Science and Technology, Zakir Nagar, Sector-6, Faridabad, India
Venue:
International Journal of Knowledge and Web Intelligence
Year:
2011

Citing 17
Cited 0

Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
A comparison of techniques to find mirrored hosts on the WWW

Journal of the American Society for Information Science
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Monolingual Document Retrieval for European Languages

Information Retrieval
Fast on-line index construction by geometric partitioning

Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Random sampling from a search engine's index

Proceedings of the 15th international conference on World Wide Web
Efficient online index maintenance for contiguous inverted lists

Information Processing and Management: an International Journal
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Extracting semantic relations from query logs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Search Engines that Learn from Implicit Feedback

Computer
Introduction to Information Retrieval

Introduction to Information Retrieval
Document indexing: a concept-based approach to term weight estimation

Information Processing and Management: an International Journal
The intention behind web queries

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Making search engines responsive to human needs requires understanding of user navigations through the search results in response to the submitted queries. The user behaviour characterisation provides an interesting perspective towards understanding the workload imposed on the search engine and can be used to address crucial points such as load balancing, content caching, data distribution and result optimisation. The user browsing behaviour is recorded in the query logs of search engines and usually referred to as web usage data. In this paper, a technique to utilise the users' browsing behaviour at the crawling and indexing process is being proposed so as to direct the crawler to download the important pages, which were not previously crawled. As the work attempts to index most of important pages based on user feedback, it would benefit the search engine to enhance its efficiency. To add further to the proposed work, the existing data structures maintained by the search engines has been refined so as to support the proposed user feedback mechanism and open more research directions.