Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Journal of the ACM (JACM)
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Efficient Merging and Filtering Algorithms for Approximate String Searches
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Bioinformatics
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Pass-join: a partition-based method for similarity joins
Proceedings of the VLDB Endowment
Trie-join: a trie-based method for efficient string similarity joins
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Nowadays, approximate string search and join, as essential operations in data integration and cleaning, has attracted significant attentions in academic. In this paper, we study string similarity search and join with edit distance constraints. Although multicore machines have become the mainstream computer architecture, most existing methods only work on a uniprocessor. To address this problem, we propose a novel parallel framework using BWT. We also devise efficient technique to utilize cache to further speed up the performance. Our method can solve similar search and join efficiently and generally. We conducted a comprehensive experimental study of our method to demonstrate the efficiency.