Performance Analysis of Three Text-Join Algorithms

Authors:
Weiyi Meng;Clement Yu;Wei Wang;Naphtali Rishe
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
1998

Citing 13
Cited 4

Federated database systems for managing distributed, heterogeneous, and autonomous databases

ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
Interoperability of multiple autonomous databases

ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
Automating the assignment of submitted manuscripts to reviewers

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
On the Consecutive-Retrieval Problem

SIAM Journal on Computing
Query processing in multidatabase systems

Modern database systems
Incremental updates of inverted lists for text document retrieval

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Query processing in a system for distributed databases (SDD-1)

ACM Transactions on Database Systems (TODS)
File organization: the consecutive retrieval property

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Design of an Integrated Information Retrieval/Database Management System

IEEE Transactions on Knowledge and Data Engineering
A Theory of Translation From Relational Queries to Hierarchical Queries

IEEE Transactions on Knowledge and Data Engineering
Translation of Object-Oriented Queries to Relational Queries

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Query Optimization in a Heterogeneous DBMS

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases

Efficient processing of joins on set-valued attributes

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Querying web metadata: Native score management and text support in databases

ACM Transactions on Database Systems (TODS)
Region clustering based evaluation of multiple top-N selection queries

Data & Knowledge Engineering
Effective early termination techniques for text similarity join operator

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins驴joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed.