Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On arabic search: improving the retrieval effectiveness via a light stemming approach
Proceedings of the eleventh international conference on Information and knowledge management
Building Bilingual Dictionaries from Parallel Web Documents
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A systematic comparison of various statistical alignment models
Computational Linguistics
Computational Linguistics - Special issue on web as corpus
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
ACM Transactions on Asian Language Information Processing (TALIP)
Internet Archive as a Source of Bilingual Dictionary
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
A DP based search using monotone alignments in statistical translation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A DP based search algorithm for statistical machine translation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Construction of a bilingual dictionary intermediated by a third language
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Reliable measures for aligning Japanese-English news articles and sentences
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A probability model to improve word alignment
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Effective phrase translation extraction from alignment models
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised learning of Arabic stemming using a parallel corpus
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Language model based arabic word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Sentence alignment using P-NNT and GMM
Computer Speech and Language
English-Arabic proper-noun transliteration-pairs creation
Journal of the American Society for Information Science and Technology
Text-based English-Arabic sentence alignment
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Hi-index | 0.00 |
Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair's extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents.The algorithms have certain variables which values can be changed to control the system precision and recall. Like most of the systems do, the accuracy of our system is directly proportional to the number of sentence pairs used. However our system is able to extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each others, synonyms and the explanation of the word in the other language as well. By controlling the system variables, we could achieve 100% precision for the antnnt bilingual dictionary with a small recall.