An efficient and versatile query engine for TopX search

Authors:
Martin Theobald;Ralf Schenkel;Gerhard Weikum
Affiliations:
Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 35
Cited 50

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Expressive retrieval from XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Querying and ranking XML documents

Journal of the American Society for Information Science and Technology - XML
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient k-NN search on vertically decomposed data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Reducing the Braking Distance of an SQL Query Engine

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
Adding Relevance to XML

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Searching XML documents via XML fragments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive Processing of Top-k Queries in XML

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Report on the DB/IR panel at SIGMOD 2005

ACM SIGMOD Record
An algebraic query model for effective and efficient retrieval of XML fragments

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems (TODS)
The database research group at the Max-Planck Institute for Informatics

ACM SIGMOD Record
Benchmarking multimedia search in structured collections

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Preparing heterogeneous XML for full-text search

ACM Transactions on Information Systems (TOIS)
XML search: languages, INEX and scoring

ACM SIGMOD Record
DB&IR: both sides now

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
iTrails: pay-as-you-go information integration in dataspaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword proximity search in complex data graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
TopX @ INEX 2007

Focused Access to XML Documents
A Comparison of Interactive and Ad-Hoc Relevance Assessments

Focused Access to XML Documents
The INEX 2007 Multimedia Track

Focused Access to XML Documents
Ranking for Approximated XQuery Full-Text Queries

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
On Top-k Search with No Random Access Using Small Memory

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Speeding Up the NRA Algorithm

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Information filtering and query indexing for an information retrieval model

ACM Transactions on Information Systems (TOIS)
Retrieving meaningful relaxed tightest fragments for XML keyword search

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Feature- and query-based table of contents generation for XML documents

ECIR'07 Proceedings of the 29th European conference on IR research
Efficient text proximity search

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Query and update through XML views

DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
A ranking scheme for XML information retrieval based on benefit and reading effort

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Efficient search and approximate information filtering in a distributed peer-to-peer environment of digital libraries

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Adaptive relaxation for querying heterogeneous XML data sources

Information Systems
Efficient top-k search across heterogeneous XML data sources

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data

Procceedings of the 13th International Workshop on the Web and Databases
Predicate-based indexing for desktop search

The VLDB Journal — The International Journal on Very Large Data Bases
ListBM: a learning-to-rank method for XML keyword search

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
TopX 2.0 at the INEX 2009 ad-hoc and efficiency tracks: distributed indexing for top-k-style content-and-structure retrieval

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
An effective 3-in-1 keyword search method over heterogeneous data sources

Information Systems
Semantic aware RSS query algebra

Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
ListOPT: learning to optimize for XML ranking

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Combining strategies for XML retrieval

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Ranking-based processing of SQL queries

Proceedings of the 20th ACM international conference on Information and knowledge management
Combining incompleteness and ranking in tree queries

ICDT'07 Proceedings of the 11th international conference on Database Theory
TopX and XXL at INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Relevance feedback for structural query expansion

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Feedback-Driven structural query expansion for ranked retrieval of XML data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Semantic relevance ranking for XML keyword search

Information Sciences: an International Journal
Searching web data: An entity retrieval and high-performance indexing model

Web Semantics: Science, Services and Agents on the World Wide Web
Enhancing user interaction and efficiency with structural summaries for fast and intuitive access to XML databases

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Kikori-KS: an effective and efficient keyword search system for digital libraries in XML

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Structural feedback for keyword-based XML retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Optimizing XML twig queries with full-text predicates

ACM SIGMOD Record
On the effectiveness of flexible querying heuristics for XML data

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
RSS query algebra: Towards a better news management

Information Sciences: an International Journal
Effective ranking and search techniques for Web resources considering semantic relationships

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The difficulties in applying the existing top-k algorithms to XML data lie in 1) the need to consider scores for XML elements while aggregating them at the document level, 2) the combination of vague content conditions with XML path conditions, 3) the need to relax query conditions if too few results satisfy all conditions, and 4) the selectivity estimation for both content and structure conditions and their impact on evaluation strategies. TopX addresses these issues by precomputing score and path information in an appropriately designed index structure, by largely avoiding or postponing the evaluation of expensive path conditions so as to preserve the sequential access pattern on index lists, and by selectively scheduling random accesses when they are cost-beneficial. In addition, TopX can compute approximate top-k results using probabilistic score estimators, thus speeding up queries with a small and controllable loss in retrieval precision.