Extracting word sequence correspondences with support vector machines

Authors:
Kengo Sato;Hiroaki Saito
Affiliations:
Keio University, Yokohama, Japan;Keio University, Yokohama, Japan
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 11
Cited 3

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Feature selection in SVM text categorization

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Advances in Large Margin Classifiers

Advances in Large Margin Classifiers
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Maximum entropy model learning of the translation rules

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Extracting word correspondences from bilingual corpora based on word co-occurrences information

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Japanese dependency structure analysis based on support vector machines

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13

Weighted kernel model for text categorization

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Word AdHoc Network: Using Google Core Distance to extract the most relevant information

Knowledge-Based Systems
Example-Based machine translation without saying inferable predicate

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a learning and extracting method of word sequence correspondences from non-aligned parallel corpora with Support Vector Machines, which have high ability of the generalization, rarely cause over-fit for training samples and can learn dependencies of features by using a kernel function. Our method uses features for the translation model which use the translation dictionary, the number of words, part-of-speech, constituent words and neighbor words. Experiment results in which Japanese and English parallel corpora are used archived 81.1% precision rate and 69.0% recall rate of the extracted word sequence correspondences. This demonstrates that our method could reduce the cost for making translation dictionaries.