Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Introduction to the special issue on computational linguistics using large corpora
Computational Linguistics - Special issue on using large corpora: I
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A word-to-word model of translational equivalence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Exogeneous and endogeneous approaches to semantic categorization of unknown technical terms
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Looking for candidate translational equivalents in specialized, comparable corpora
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Base Noun Phrase translation using web data and the EM algorithm
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Conceptual structuring through term variations
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Translation by machine of complex nominals: getting it right
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Compilation of specialized comparable corpora in French and Japanese
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
French-english terminology extraction from comparable corpora
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Current research in text mining favors the quantity of texts over their representativeness. But for bilingual terminology mining, and for many language pairs, large comparable corpora are not available. More importantly, as terms are defined vis-à-vis a specific domain with a restricted register, it is expected that the representativeness rather than the quantity of the corpus matters more in terminology mining. Our hypothesis, therefore, is that the representativeness of the corpus is more important than the quantity and ensures the quality of the acquired terminological resources. This article tests this hypothesis on a French-Japanese bilingual term extraction task. To demonstrate how important the type of discourse is as a characteristic of the comparable corpora, we used a state-of-the-art multilingual terminology mining chain composed of two extraction programs, one in each language, and an alignment program. We evaluated the candidate translations using a reference list, and found that taking discourse type into account resulted in candidate translations of a better quality even when the corpus size was reduced by half.