TopX: efficient and versatile top-k query processing for semistructured data

Authors:
Martin Theobald;Holger Bast;Debapriyo Majumdar;Ralf Schenkel;Gerhard Weikum
Affiliations:
Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2008

Citing 0
Cited 23

Fine-grained relevance feedback for XML retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
TopX @ INEX 2007

Focused Access to XML Documents
Person Retrieval on XML Documents by Coreference Analysis Utilizing Structural Features

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Database and information-retrieval methods for knowledge discovery

Communications of the ACM - A Direct Path to Dependable Software
Flexible document-query matching based on a probabilistic content and structure score combination

Proceedings of the 2010 ACM Symposium on Applied Computing
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Exploit keyword query semantics and structure of data for effective XML keyword search

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Using the past to score the present: extending term weighting models through revision history analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
TopX 2.0 at the INEX 2009 ad-hoc and efficiency tracks: distributed indexing for top-k-style content-and-structure retrieval

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Knowledge-based sense disambiguation (almost) for all structures

Information Systems
Processing top-k join queries

Proceedings of the VLDB Endowment
Providing built-in keyword search capabilities in RDBMS

The VLDB Journal — The International Journal on Very Large Data Bases
Unified structure and content search for personal information management systems

Proceedings of the 14th International Conference on Extending Database Technology
Keyword search over relational databases: a metadata approach

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Score-consistent algebraic optimization of full-text search queries with GRAFT

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient fuzzy full-text type-ahead search

The VLDB Journal — The International Journal on Very Large Data Bases
Lightweight integration of IR and DB for scalable hybrid search with integrated ranking support

Web Semantics: Science, Services and Agents on the World Wide Web
Enriching short text representation in microblog for clustering

Frontiers of Computer Science in China
Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine

ACM Transactions on Intelligent Systems and Technology (TIST)
MAXLCA: a new query semantic model for XML keyword search

Journal of Web Engineering
Leveraging the storage layer to support XML similarity joins in XDBMSs

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Fast and incremental indexing in effective and efficient XML element retrieval systems

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Semantic to intelligent web era: building blocks, applications, and current trends

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.