An improved method for finding bilingual collocation correspondences from monolingual corpora

  • Authors:
  • Ruifeng Xu;Kam-Fai Wong;Qin Lu;Wenjie Li

  • Affiliations:
  • Dept. of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, N.T., Hong Kong;Dept. of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, N.T., Hong Kong;Dept. of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong;Dept. of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong

  • Venue:
  • ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bilingual collocation correspondence is helpful to machine translation and second language learning. Existing techniques for identifying Chinese-English collocation correspondence suffer from two major problems. They are sensitive to the coverage of the bilingual dictionary and the insensitive to semantic and contextual information. This paper presents the ICT (Improved Collocation Translation) method to overcome these problems. For a given Chinese collocation, the word translation candidates extracted from a bilingual dictionary are expanded to improve the coverage. A new translation model, which incorporates statistics extracted from monolingual corpora, word semantic similarities from monolingual thesaurus and bilingual context similarities, is employed to estimate and rank the probabilities of the collocation correspondence candidates. Experiments show that ICT is robust to the coverage of bilingual dictionary. It achieves 50.1% accuracy for the first candidate and 73.1% accuracy for the top-3 candidates.