Adaptive on-line page importance computation

Authors:
Serge Abiteboul;Mihai Preda;Gregory Cobena
Affiliations:
INRIA Domaine de Voluceau, Rocquencourt, France;Xyleme SA, Saint-Cloud, France;INRIA Domaine de Voluceau, Rocquencourt, France
Venue:
WWW '03 Proceedings of the 12th international conference on World Wide Web
Year:
2003

Citing 15
Cited 61

Randomized algorithms

ACM Computing Surveys (CSUR)
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Synchronizing a database to improve freshness

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An efficient algorithm to rank Web resources

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Google's Web Page Ranking applied to different topological Web Graph structures

Journal of the American Society for Information Science
Information Retrieval

Information Retrieval
Xyleme: A Dynamic Warehouse for XML Data of the Web

IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
A First Experience in Archiving the French Web

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries

The XML web: a first study

WWW '03 Proceedings of the 12th international conference on World Wide Web
Impact of search engines on page popularity

Proceedings of the 13th international conference on World Wide Web
Ranking the web frontier

Proceedings of the 13th international conference on World Wide Web
Distributed ranking over peer-to-peer networks

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Local methods for estimating pagerank values

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Crawling a country: better strategies than breadth-first for web page ordering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Page quality: in search of an unbiased web ranking

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Studying the XML Web: Gathering Statistics from an XML Sample

World Wide Web
A non-manipulable trust system based on EigenTrust

ACM SIGecom Exchanges
Efficient PageRank approximation via graph aggregation

Information Retrieval
Efficient and decentralized PageRank approximation in a peer-to-peer web search network

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Random Surfer with Back Step

Fundamenta Informaticae
Dynamic personalized pagerank in entity-relation graphs

Proceedings of the 16th international conference on World Wide Web
On rank correlation in information retrieval evaluation

ACM SIGIR Forum
RankMass crawler: a crawler with high personalized pagerank coverage guarantee

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Crawl ordering by search impact

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
DistanceRank: An intelligent ranking algorithm for web pages

Information Processing and Management: an International Journal
A punishment/reward based approach to ranking

Proceedings of the 2nd international conference on Scalable information systems
Exploring traversal strategy for web forum crawling

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Guide focused crawler efficiently and effectively using on-line topical importance estimation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficiently Handling Dynamics in Distributed Link Based Authority Analysis

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
High-performance priority queues for parallel crawlers

Proceedings of the 10th ACM workshop on Web information and data management
On the feasibility of geographically distributed web crawling

Proceedings of the 3rd international conference on Scalable information systems
Ranking Web Pages Using Machine Learning Approaches

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Nullification test collections for web spam and SEO

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Analysis of an on-line algorithm for solving large Markov chains

Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools
IRLbot: Scaling to 6 billion pages and beyond

ACM Transactions on the Web (TWEB)
The impact of crawl policy on web search effectiveness

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Centralities: capturing the fuzzy notion of importance in social graphs

Proceedings of the Second ACM EuroSys Workshop on Social Network Systems
Profile-based focused crawling for social media-sharing websites

Journal on Image and Video Processing
Exploiting Tags and Social Profiles to Improve Focused Crawling

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Finding the topical anchors of a context using lexical cooccurrence data

Proceedings of the 18th ACM conference on Information and knowledge management
URL normalization for de-duplication of web pages

Proceedings of the 18th ACM conference on Information and knowledge management
FICA: A novel intelligent crawling algorithm based on reinforcement learning

Web Intelligence and Agent Systems
Corroborating information from disagreeing views

Proceedings of the third ACM international conference on Web search and data mining
Learning URL patterns for webpage de-duplication

Proceedings of the third ACM international conference on Web search and data mining
Web Crawling

Foundations and Trends in Information Retrieval
Managing an XML warehouse in a P2P context

CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering
Tracking the random surfer: empirically measured teleportation parameters in PageRank

Proceedings of the 19th international conference on World wide web
Optimizing web structures using web mining techniques

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
News page discovery policy for instant crawlers

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Learning to recommend product with the content of web page

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Co-citations as citation endorsements and co-links as link endorsements

Journal of Information Science
Quasi-stationary distributions as centrality measures for the giant strongly connected component of a reducible graph

Journal of Computational and Applied Mathematics
The importance of anchor text for ad hoc search revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Second order centrality: Distributed assessment of nodes criticity in complex networks

Computer Communications
Index design and query processing for graph conductance search

The VLDB Journal — The International Journal on Very Large Data Bases
Web page importance ranking

Advances in Data Analysis and Classification
The GOSSPLE anonymous social network

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Towards a quality service layer for web 2.0

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
User browsing behavior-driven web crawling

Proceedings of the 20th ACM international conference on Information and knowledge management
A link-based ranking model for services

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
On line course organization

ICWL'07 Proceedings of the 6th international conference on Advances in web based learning
Online sampling of high centrality individuals in social networks

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A novel crawling algorithm for web pages

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Robust framework for recommending restructuring of websites by analysing web usage and web structure data

International Journal of Business Intelligence and Data Mining
Dynamic pagerank using evolving teleportation

WAW'12 Proceedings of the 9th international conference on Algorithms and Models for the Web Graph
Random Surfer with Back Step

Fundamenta Informaticae
Timely crawling of high-quality ephemeral new content

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A Local Method for ObjectRank Estimation

Proceedings of International Conference on Information Integration and Web-based Applications & Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.