An Efficient Digital Search Algorithm by Using a Double-Array Structure
IEEE Transactions on Software Engineering
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
An algorithm for finding noun phrase correspondences in bilingual corpora
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Towards a simple and accurate statistical approach to learning translation relationships among words
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A comparative study on translation units for bilingual lexicon extraction
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Practical translation pattern acquisition from combined language resources
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Hi-index | 0.00 |
We present an unsupervised extraction of sequence-to-sequence correspondences from parallel corpora by sequential pattern mining. The main characteristics of our method are two-fold. First, we propose a systematic way to enumerate all possible translation pair candidates of rigid and gapped sequences without falling into combinatorial explosion. Second, our method uses an efficient data structure and algorithm for calculating frequencies in a contingency table for each translation pair candidate. Our method is empirically evaluated using English-Japanese parallel corpora of 6 million words. Results indicate that it works well for multi-word translations, giving 56--84% accuracy at 19% token coverage and 11% type coverage.