MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

Authors:
Srinivasa K.G.;Anil Kumar Muppalla;Bharghava Varun A.;Amulya M.
Affiliations:
M.S. Ramaiah Institute of Technology, India;M.S. Ramaiah Institute of Technology, India;M.S. Ramaiah Institute of Technology, India;M.S. Ramaiah Institute of Technology, India
Venue:
International Journal of Information Retrieval Research
Year:
2011

Citing 9
Cited 1

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
Webcrawler: finding what people want

Webcrawler: finding what people want
Web Search Engines: Part 1

Computer
Web Search Engines: Part 2

Computer
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Web Crawling

Foundations and Trends in Information Retrieval
Information Retrieval: Implementing and Evaluating Search Engines

Information Retrieval: Implementing and Evaluating Search Engines

Effects of Terms Recognition Mistakes on Requests Processing for Interactive Information Retrieval

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes use of the link structure of the Web and is developed using MapReduce framework to improve the speed of convergence of ranking the WebPages. Categorization is used to retrieve and order the results according to the user choice to personalize the search. A new score is introduced in this paper that is associated with each WebPage and is calculated using user's query and number of occurrences of the terms in the query in the document corpus. The experiments are conducted on Web graph datasets and the results are compared with the serial versions of crawler, indexer and ranking algorithms.