Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Evaluating sense disambiguation across diverse parameter spaces
Natural Language Engineering
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
New techniques for context modeling
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Measuring semantic similarity by latent relational analysis
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Robust utilization of context in word sense disambiguation
CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context
Context-based sentence alignment in parallel corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Hi-index | 0.00 |
This paper presents a novel language-independent context-based sentence alignment technique given parallel corpora. We can view the problem of aligning sentences as finding translations of sentences chosen from different sources. Unlike current approaches which rely on pre-defined features and models, our algorithm employs features derived from the distributional properties of sentences and does not use any language dependent knowledge. We make use of the context of sentences and introduce the notion of Zipfian word vectors which effectively models the distributional properties of a given sentence.We accept the context to be the frame in which the reasoning about sentence alignment is done. We examine alternatives for local context models and demonstrate that our context based sentence alignment algorithm performs better than prominent sentence alignment techniques. Our system dynamically selects the local context for a pair of set of sentences which maximizes the correlation.We evaluate the performance of our system based on two different measures: sentence alignment accuracy and sentence alignment coverage. We compare the performance of our system with commonly used sentence alignment systems and show that our system performs 1.1951 to 1.5404 times better in reducing the error rate in alignment accuracy and coverage.