Automatic text processing
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Processing queries for first-few answers
CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal probing: supporting expensive predicates for top-k queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Performance Analysis of Three Text-Join Algorithms
IEEE Transactions on Knowledge and Data Engineering
Supporting Incremental Join Queries on Ranked Inputs
Proceedings of the 27th International Conference on Very Large Data Bases
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Metadata-based modeling of information resources on the Web
Journal of the American Society for Information Science and Technology
Efficient similarity-based operations for data integration
Data & Knowledge Engineering
Querying web metadata: Native score management and text support in databases
ACM Transactions on Database Systems (TODS)
Supporting top-K join queries in relational databases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hi-index | 0.00 |
Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics.