pest: Fast approximate keyword search in semantic data using eigenvector-based term propagation

Authors:
Klara Weiand;Fabian Kneiíl;Wojciech łobacz;Tim Furche;François Bry
Affiliations:
Institute for Informatics, Ludwig-Maximilians-Universität, 80538 Munich, Germany;Institute for Informatics, Ludwig-Maximilians-Universität, 80538 Munich, Germany;Institute for Informatics, Ludwig-Maximilians-Universität, 80538 Munich, Germany;Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford OX1 3QD, United Kingdom and Institute for Informatics, Ludwig-Maximilians-Universität, 80538 Munich, Germany;Institute for Informatics, Ludwig-Maximilians-Universität, 80538 Munich, Germany
Venue:
Information Systems
Year:
2012

Citing 21
Cited 2

Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Application of Spreading Activation Techniques in InformationRetrieval

Artificial Intelligence Review
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
Querying and ranking XML documents

Journal of the American Society for Information Science and Technology - XML
Approximate XML joins

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Tree Pattern Relaxation

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
ATreeGrep: Approximate Searching in Unordered Trees

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Similarity evaluation on tree-structured data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A survey on tree edit distance and related problems

Theoretical Computer Science
ObjectRank: a system for authority-based search on databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Dynamic personalized pagerank in entity-relation graphs

Proceedings of the 16th international conference on World Wide Web
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Combining document- and paragraph-based entity ranking

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Language-model-based ranking for queries on RDF-graphs

Proceedings of the 18th ACM conference on Information and knowledge management
Flavors of KWQL, a Keyword Query Language for a Semantic Wiki

SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
KWilt: a semantic patchwork for flexible access to heterogeneous knowledge

RR'10 Proceedings of the Fourth international conference on Web reasoning and rule systems
Survey: An overview on XML similarity: Background, current trends and future directions

Computer Science Review

Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present pest, a novel approach to the approximate querying of graph-structured data such as RDF that exploits the data's structure to propagate term weights between related data items. We focus on data where meaningful answers are given through the application semantics, e.g., pages in wikis, persons in social networks, or papers in a research network such as Mendeley. The pest matrix generalizes the Google Matrix used in PageRank with a term-weight dependent leap and accommodates different levels of (semantic) closeness for different relations in the data, e.g., friend vs. co-worker in a social network. Its eigenvectors represent the distribution of a term after propagation. The eigenvectors for all terms together form a (vector space) index that takes the structure of the data into account and can be used with standard document retrieval techniques. In extensive experiments including a user study on a real life wiki, we show how pest improves the quality of the ranking over a range of existing ranking approaches, yet achieves a query performance comparable to a plain vector space index.