Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing
Foundations of statistical natural language processing
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The soft heap: an approximate priority queue with optimal error rate
Journal of the ACM (JACM)
Using PageRank to Characterize Web Structure
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Extrapolation methods for accelerating PageRank computations
WWW '03 Proceedings of the 12th international conference on World Wide Web
Scaling personalized web search
WWW '03 Proceedings of the 12th international conference on World Wide Web
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Towards Compressing Web Graphs
DCC '01 Proceedings of the Data Compression Conference
Keyword Searching and Browsing in Databases using BANKS
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
Sic transit gloria telae: towards an understanding of the web's decay
Proceedings of the 13th international conference on World Wide Web
Texquery: a full-text search extension to xquery
Proceedings of the 13th international conference on World Wide Web
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Automatic multimedia cross-modal correlation discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A uniform approach to accelerated PageRank computation
WWW '05 Proceedings of the 14th international conference on World Wide Web
SPIN: searching personal information networks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing scoring functions and indexes for proximity search in type-annotated corpora
Proceedings of the 15th international conference on World Wide Web
Contextual search and name disambiguation in email using graphs
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring and extracting proximity in networks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Center-piece subgraphs: problem definition and fast solutions
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Random Walk with Restart and Its Applications
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Dynamic personalized pagerank in entity-relation graphs
Proceedings of the 16th international conference on World Wide Web
Random walks on the click graph
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Fast direction-aware proximity for graph mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Objectrank: authority-based keyword search in databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top-k query evaluation with probabilistic guarantees
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fast algorithms for topk personalized pagerank queries
Proceedings of the 17th international conference on World Wide Web
Fast incremental proximity search in large graphs
Proceedings of the 25th international conference on Machine learning
Estimating the size of the transitive closure in linear time
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Keyword search on external memory data graphs
Proceedings of the VLDB Endowment
Index Design for Dynamic Personalized PageRank
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
BinRank: Scaling Dynamic Authority-Based Search Using Materialized SubGraphs
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Learning parameters in entity relationship graphs from ranking preferences
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Efficient personalized pagerank with accuracy assurance
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient ad-hoc search for personalized PageRank
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Data-based research at IIT Bombay
ACM SIGMOD Record
Incremental and accuracy-aware personalized pagerank through scheduled approximation
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Graph conductance queries, also known as personalized PageRank and related to random walks with restarts, were originally proposed to assign a hyperlink-based prestige score to Web pages. More general forms of such queries are also very useful for ranking in entity-relation (ER) graphs used to represent relational, XML and hypertext data. Evaluation of PageRank usually involves a global eigen computation. If the graph is even moderately large, interactive response times may not be possible. Recently, the need for interactive PageRank evaluation has increased. The graph may be fully known only when the query is submitted. Browsing actions of the user may change some inputs to the PageRank computation dynamically. In this paper, we describe a system that analyzes query workloads and the ER graph, invests in limited offline indexing, and exploits those indices to achieve essentially constant-time query processing, even as the graph size scales. Our techniques--data and query statistics collection, index selection and materialization, and query-time index exploitation--have parallels in the extensive relational query optimization literature, but is applied to supporting novel graph data repositories. We report on experiments with five temporal snapshots of the CiteSeer ER graph having 74---702 thousand entity nodes, 0.17---1.16 million word nodes, 0.29---3.26 million edges between entities, and 3.29---32.8 million edges between words and entities. We also used two million actual queries from CiteSeer's logs. Queries run 3---4 orders of magnitude faster than whole-graph PageRank, the gap growing with graph size. Index size is smaller than a text index. Ranking accuracy is 94---98% with reference to whole-graph PageRank.