Discovery of inference rules for question-answering
Natural Language Engineering
Extracting paraphrases from a parallel corpus
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Paraphrasing with bilingual parallel corpora
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Syntactic constraints on paraphrases extracted from parallel corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning by reading: a prototype system, performance baseline and lessons learned
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Hitting the right paraphrases in good time
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
An empirical evaluation of data-driven paraphrase generation techniques
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Paraphrase identification on the basis of supervised machine learning techniques
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Aligning needles in a haystack: paraphrase acquisition across the web
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Diversity-aware evaluation for paraphrase patterns
TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Hi-index | 0.00 |
Paraphrase acquisition is an important natural language processing (NLP) task that has received a great deal of interest recently. Proposed solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that heavily rely on numerous language-dependent resources. Despite all of the work, there are no publicly available toolkits to support large-scale paraphrase mining research. There has also never been a direct empirical evaluation comparing the merits of simple, scalable approaches and those that make extensive use of expensive NLP resources. This paper introduces Mavuno, a Hadoop-based paraphrase acquisition toolkit that is both scalable and robust. Within the context of Mavuno, we empirically examine the tradeoffs between simple and sophisticated paraphrase acquisition approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that simple approaches have many advantages, including strong effectiveness, good coverage, low redundancy, and ability to handle noisy data.