A practical minimal perfect hashing method
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
DL2Go: Editable Digital Libraries in the Pocket
ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
A new approach for cross-language plagiarism analysis
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Cross-language plagiarism detection
Language Resources and Evaluation
Hi-index | 0.01 |
We develop and implement a new indexing technology which allows us to use complete (and possibly very large) documents as queries, while having a retrieval performance comparable to a standard term query. Our approach aims at retrieval tasks such as near duplicate detection and high similarity search. To demonstrate the performance of our technology we have compiled the search index "Wikipedia in the Pocket", which contains about 2 million English and German Wikipedia articles.1 This index--along with a search interface--fits on a conventional CD (0.7 gigabyte). The ingredients of our indexing technology are similarity hashing and minimal perfect hashing.