ACM Computing Surveys (CSUR)
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An efficient algorithm to rank Web resources
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Google's Web Page Ranking applied to different topological Web Graph structures
Journal of the American Society for Information Science
Information Retrieval
Xyleme: A Dynamic Warehouse for XML Data of the Web
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
A First Experience in Archiving the French Web
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
WWW '03 Proceedings of the 12th international conference on World Wide Web
Impact of search engines on page popularity
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 13th international conference on World Wide Web
Distributed ranking over peer-to-peer networks
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Local methods for estimating pagerank values
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Page quality: in search of an unbiased web ranking
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A non-manipulable trust system based on EigenTrust
ACM SIGecom Exchanges
Efficient PageRank approximation via graph aggregation
Information Retrieval
Efficient and decentralized PageRank approximation in a peer-to-peer web search network
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fundamenta Informaticae
Dynamic personalized pagerank in entity-relation graphs
Proceedings of the 16th international conference on World Wide Web
On rank correlation in information retrieval evaluation
ACM SIGIR Forum
RankMass crawler: a crawler with high personalized pagerank coverage guarantee
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Crawl ordering by search impact
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
DistanceRank: An intelligent ranking algorithm for web pages
Information Processing and Management: an International Journal
A punishment/reward based approach to ranking
Proceedings of the 2nd international conference on Scalable information systems
Exploring traversal strategy for web forum crawling
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Guide focused crawler efficiently and effectively using on-line topical importance estimation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficiently Handling Dynamics in Distributed Link Based Authority Analysis
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
High-performance priority queues for parallel crawlers
Proceedings of the 10th ACM workshop on Web information and data management
On the feasibility of geographically distributed web crawling
Proceedings of the 3rd international conference on Scalable information systems
Ranking Web Pages Using Machine Learning Approaches
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Nullification test collections for web spam and SEO
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Analysis of an on-line algorithm for solving large Markov chains
Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools
IRLbot: Scaling to 6 billion pages and beyond
ACM Transactions on the Web (TWEB)
The impact of crawl policy on web search effectiveness
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Centralities: capturing the fuzzy notion of importance in social graphs
Proceedings of the Second ACM EuroSys Workshop on Social Network Systems
Profile-based focused crawling for social media-sharing websites
Journal on Image and Video Processing
Exploiting Tags and Social Profiles to Improve Focused Crawling
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Finding the topical anchors of a context using lexical cooccurrence data
Proceedings of the 18th ACM conference on Information and knowledge management
URL normalization for de-duplication of web pages
Proceedings of the 18th ACM conference on Information and knowledge management
FICA: A novel intelligent crawling algorithm based on reinforcement learning
Web Intelligence and Agent Systems
Corroborating information from disagreeing views
Proceedings of the third ACM international conference on Web search and data mining
Learning URL patterns for webpage de-duplication
Proceedings of the third ACM international conference on Web search and data mining
Foundations and Trends in Information Retrieval
Managing an XML warehouse in a P2P context
CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering
Tracking the random surfer: empirically measured teleportation parameters in PageRank
Proceedings of the 19th international conference on World wide web
Optimizing web structures using web mining techniques
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
News page discovery policy for instant crawlers
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Learning to recommend product with the content of web page
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Co-citations as citation endorsements and co-links as link endorsements
Journal of Information Science
Journal of Computational and Applied Mathematics
The importance of anchor text for ad hoc search revisited
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Second order centrality: Distributed assessment of nodes criticity in complex networks
Computer Communications
Index design and query processing for graph conductance search
The VLDB Journal — The International Journal on Very Large Data Bases
Advances in Data Analysis and Classification
The GOSSPLE anonymous social network
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Towards a quality service layer for web 2.0
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
User browsing behavior-driven web crawling
Proceedings of the 20th ACM international conference on Information and knowledge management
A link-based ranking model for services
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
ICWL'07 Proceedings of the 6th international conference on Advances in web based learning
Online sampling of high centrality individuals in social networks
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A novel crawling algorithm for web pages
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
International Journal of Business Intelligence and Data Mining
Dynamic pagerank using evolving teleportation
WAW'12 Proceedings of the 9th international conference on Algorithms and Models for the Web Graph
Fundamenta Informaticae
Timely crawling of high-quality ephemeral new content
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A Local Method for ObjectRank Estimation
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.