A statistical approach to machine translation
Computational Linguistics
Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models of translational equivalence among words
Computational Linguistics
A statistical approach to language translation
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Dilemma: an instant lexicographer
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Using bag-of-concepts to improve the performance of support vector machines in text categorization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
Representing words as regions in vector space
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Random indexing using statistical weight functions
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Supporting inferences in semantic space: representing words as regions
IWCS-8 '09 Proceedings of the Eighth International Conference on Computational Semantics
What is word meaning, really?: (and how can distributional models help us describe it?)
GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Concept based representations for ranking in geographic information retrieval
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Maximum likelihood alignment of translation equivalents
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Dynamic lexica for query translation
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
A random indexing approach for web user clustering and web prefetching
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Graph-based alignment of narratives for automated neurological assessment
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
iCLEF 2006 Overview: searching the flickr WWW photo-sharing repository
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Trusting the results in cross-lingual keyword-based image retrieval
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Cross-lingual random indexing for information retrieval
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
This paper presents a very simple and effective approach to using parallel corpora for automatic bilingual lexicon acquisition. The approach, which uses the Random Indexing vector space methodology, is based on finding correlations between terms based on their distributional characteristics. The approach requires a minimum of preprocessing and linguistic knowledge, and is efficient, fast and scalable. In this paper, we explain how our approach differs from traditional cooccurrence-based word alignment algorithms, and we demonstrate how to extract bilingual lexica using the Random Indexing approach applied to aligned parallel data. The acquired lexica are evaluated by comparing them to manually compiled gold standards, and we report overlap of around 60%. We also discuss methodological problems with evaluating lexical resources of this kind.