Knowledge Extraction from Bilingual Corpora

Authors:
Harold L. Somers
Affiliations:
-
Venue:
Information Extraction: Towards Scalable, Adaptable Systems
Year:
1999

Citing 23
Cited 0

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Approximate String Matching

ACM Computing Surveys (CSUR)
A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition

Machine Translation
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
A simple hybrid aligner for generating lexical correspondences in parallel texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
K-vec: a new approach for aligning parallel texts

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Towards automatic extraction of monolingual and bilingual terminology

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Bilingual knowledge acquisition from Korean-English parallel corpus using alignment method: Korean-English alignment at word and phrase level

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Automatic detection of omissions in translations

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Mining the Web for bilingual text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of corpora has become an important issue in IE. In this chapter we consider a specific type of corpus, the bilingual parallel corpus, and ways of automatically extracting information from such corpora. This information, "linguistic metaknowledge", is essential for techniques such as tokenization, POS-tagging, morphological analysis, used in IE. Where we wish to extract information from multilingual texts, we must rely on these linguistic resources being available in several languages. This chapter discusses locating and storing parallel texts, alignment at various levels (sentence, word, phrase), and extraction of bilingual vocabulary and terminology.