Experiences with IR TOP N Optimization in a Main Memory DBMS: Applying `the Database Approach' in New Domains

Authors:
Henk Ernst Blok;Arjen P. de Vries;Henk M. Blanken;Peter M. G. Apers
Affiliations:
-;-;-;-
Venue:
BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Year:
2001

Citing 10
Cited 2

Fuzzy queries in multimedia database systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Information Retrieval

Information Retrieval
Information Retrieval: Algorithms and Heuristics

Information Retrieval: Algorithms and Heuristics
Flattening an Object Algebra to Provide Performance

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Mirror MMDBMS Architecture

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
MIL primitives for querying a fragmented world

The VLDB Journal — The International Journal on Very Large Data Bases
Execution Performance Issues in Full-Text Information Retrieval

Execution Performance Issues in Full-Text Information Retrieval

Predicting the cost-quality trade-off for information retrieval queries: facilitating database design and query optimization

Proceedings of the tenth international conference on Information and knowledge management
Flexible and scalable digital library search

Proceedings of the 27th International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data abstraction and query processing techniques are usually studied in the domain of administrative applications. We present a case-study in the non-standard domain of (multimedia) information retrieval, mainly intended as a feasibility study in favor of the 'database approach' to data management.Top-N queries form a natural query class when dealing with content retrieval. In the IR field, a lot of research has been done on processing top-N queries efficiently. Unfortunately, these results cannot directly be ported to the database environment, because their tuple-oriented nature would seriously limit the freedom of the query optimizer to select appropriate query plans.By horizontally fragmenting our database containing document statistics, we are able to combine some of the best of the IR and database optimization principles, providing good retrieval quality as well as database 'goodies' like flexibility, scalability, efficiency, and generality. Key issues we address in this paper concern the effects of our fragmentation approach on speed and quality of the answers, opportunities for scalability, supported by experimental results.