Elements of information theory
Elements of information theory
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating a probabilistic model for cross-lingual information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage language models for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Computational Linguistics - Special issue on using large corpora: I
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A bootstrapping method for extracting bilingual text pairs
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Extraction of lexical translations from non-aligned corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A cross-collection mixture model for comparative text mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
CWS: a comparative web search system
Proceedings of the 15th international conference on World Wide Web
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining correlated bursty topic patterns from coordinated text streams
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature-based method for document alignment in comparable news corpora
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised named entity transliteration using temporal and phonetic correlation
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Creating a Persian-English comparable corpus
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Mining named entities with temporally correlated bursts from multilingual web news streams
Proceedings of the fourth ACM international conference on Web search and data mining
Cross lingual text classification by mining multilingual topics from wikipedia
Proceedings of the fourth ACM international conference on Web search and data mining
Mining large-scale comparable corpora from Chinese-English news collections
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Rare word translation extraction from aligned comparable documents
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Cross-lingual slot filling from comparable corpora
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Topic based creation of a persian-english comparable corpus
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A language modeling approach for extracting translation knowledge from comparable corpora
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Integrating information in multiple natural languages is a challenging task that often requires manually created linguistic resources such as a bilingual dictionary or examples of direct translations of text. In this paper, we propose a general cross-lingual text mining method that does not rely on any of these resources, but can exploit comparable bilingual text corpora to discover mappings between words and documents in different languages. Comparable text corpora are collections of text documents in different languages that are about similar topics; such text corpora are often naturally available (e.g., news articles in different languages published in the same time period). The main idea of our method is to exploit frequency correlations of words in different languages in the comparable corpora and discover mappings between words in different languages. Such mappings can then be used to further discover mappings between documents in different languages, achieving cross-lingual information integration. Evaluation of the proposed method on a 120MB Chinese-English comparable news collection shows that the proposed method is effective for mapping words and documents in English and Chinese. Since our method only relies on naturally available comparable corpora, it is generally applicable to any language pairs as long as we have comparable corpora.