A Multilingual Procedure for Dictionary-Based Sentence Alignment
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Computational Linguistics - Special issue on web as corpus
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
High-performance bilingual text alignment using statistical and dictionary information
Natural Language Engineering
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A Hybrid Approach to Sentence Alignment Using Genetic Algorithm
ICCTA '07 Proceedings of the International Conference on Computing: Theory and Applications
Sentence alignment using P-NNT and GMM
Computer Speech and Language
Improved sentence alignment on parallel web pages using a stochastic tree alignment model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Context-based sentence alignment in parallel corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Extracting parallel sentences from comparable corpora using document level alignment
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Fast-Champollion: a fast and robust sentence alignment algorithm
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
TEP: Tehran English-Persian parallel corpus
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Bitext Alignment
Bilingual sentence alignment based on punctuation statistics and lexicon
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Hi-index | 0.00 |
The task of sentence and paragraph alignment is essential for preparing parallel texts that are needed in applications such as machine translation. The lack of sufficient linguistic data for under-resourced languages like Persian is a challenging issue. In this paper, we proposed a hybrid sentence and paragraph alignment model on Persian-English parallel documents based on simple linguistic features as well as length similarity between sentences and paragraphs of source and target languages. We apply a small bilingual dictionary of Persian-English nouns, punctuation marks, and length similarity as alignment metrics. We combine these features in a linear model and use genetic algorithm to learn the linear equation weights. Evaluation results show that the extracted features improve the baseline model which is only a length-based one.