Querying across languages: a dictionary-based approach to multilingual information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Translingual information retrieval: learning from bilingual corpora
Artificial Intelligence - Special issue: artificial intelligence 40 years later
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Unitary operators on the document space
Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
The document as an ergodic markov chain
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Trading spaces: on the lore and limitations of latent semantic analysis
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Progress in information retrieval
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
The quality of search engines depends usually on the content of the returned documents rather than on the text used to express this content. So ideally, search techniques should be directed more toward the semantic dependencies underlying documents than toward the texts themselves. The most visible examples in this direction are Latent Semantic Analysis (LSA), and the Hyperspace Analog to Language (HAL). If these techniques are really based on semantic dependencies, as they contend, then they should be applicable across languages. To investigate this contention we used electronic versions of two kinds of material with their translations: a novel, and a popular treatise about cosmology. We used the analogy of fingerprinting as employed in forensics to establish whether individuals are related. Genetic fingerprinting uses enzymes to split the DNA and then compare the resulting band patterns. Likewise, in our research we used queries to split a document into fragments. If a search technique really isolates fragments semantically related to the query, then a document and its translation should have similar band patterns. In this paper we (1) present the fingerprinting technique, (2) introduce the material used, and (3) report results of an evaluation for two semantic indexing techniques.