Internet Archive as a Source of Bilingual Dictionary

  • Authors:
  • Mohamed Abdel Fattah;Fuji Ren;Kuroiwa Shingo

  • Affiliations:
  • -;-;-

  • Venue:
  • ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel corpus is a very important tool to construct agood machine translation system or make any naturallanguage processing research for cross languageinformation retrieval.Internet archive is a good source ofparallel documents in different languages.In order toconstruct a good parallel corpus from the Internetarchive, Bilingual dictionary that contains word pairswhich may not exist in commercial dictionaries is a must.Extracting a bilingual dictionary from the internetparallel documents is important to add words that areabsent from the traditional dictionaries.This paperdescribes two algorithms to automatically extract anEnglish/ Arabic bilingual dictionary from parallel textsthat exist in the Internet arhive.The system shouldpreferably be useful for many different language pairs.Like most of the systems done, the accuracy of our systemis directly proportional to the amount of sentence pairsused.By controlling the system parameters, we couldachieve 100% precision for the output bilingualdictionary, but the size of the dictionary will be smaller.