Information Theoretic Clustering of Sparse Co-Occurrence Data
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Paraphrasing with bilingual parallel corpora
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improved statistical machine translation using paraphrases
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Syntactic constraints on paraphrases extracted from parallel corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Affinity measures based on the graph Laplacian
TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Extracting paraphrase patterns from bilingual parallel corpora
Natural Language Engineering
Statistical Machine Translation
Statistical Machine Translation
Hitting the right paraphrases in good time
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Paraphrase lattice for statistical machine translation
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Phrase clustering for smoothing TM probabilities: or, how to extract paraphrases from phrase tables
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
An empirical evaluation of data-driven paraphrase generation techniques
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages. In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases. We convert the clusters into graphs, add smoothing/syntactic-information-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time. The resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial co-occurrence counts with a novel technique. The co-occurrence count distribution belongs to the power-law family.