Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization
ACM Transactions on Mathematical Software (TOMS)
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Information fusion in the context of multi-document summarization
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Discriminative Reranking for Natural Language Parsing
Computational Linguistics
Improved statistical machine translation using paraphrases
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Using syntactic information to identify plagiarism
EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Answering the question you wish they had asked: the impact of paraphrasing for question answering
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Paraphrase identification as probabilistic quasi-synchronous recognition
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Tree kernel-based SVM with structured syntactic knowledge for BTG-based phrase reordering
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Extending the meteor machine translation evaluation metric to the phrase level
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Ensemble models for dependency parsing: cheap and good?
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Paraphrase identification on the basis of supervised machine learning techniques
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improving statistical word alignment with ensemble methods
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A sequential model for discourse segmentation
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Experiments with SVM to classify opinions in different domains
Expert Systems with Applications: An International Journal
Foundations of Machine Learning
Foundations of Machine Learning
Re-examining machine translation metrics for paraphrase identification
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
SemEval-2012 task 6: a pilot on semantic textual similarity
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Baselines and bigrams: simple, good sentiment and topic classification
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A novel discriminative framework for sentence-level discourse analysis
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Using discourse information for paraphrase extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A reranking model for discourse segmentation using subtree features
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Hi-index | 12.05 |
Previous work on paraphrase identification using sentence similarities has not exploited discourse structures, which have been shown as important information for paraphrase computation. In this paper, we propose a new method named EDU-based similarity, to compute the similarity between two sentences based on elementary discourse units. Unlike conventional methods, which directly compute similarities based on sentences, our method divides sentences into discourse units and employs them to compute similarities. We also show the relation between paraphrases and discourse units, which plays an important role in paraphrasing. We apply our method to the paraphrase identification task. Experimental results on the PAN corpus, a large corpus for detecting paraphrases, show the effectiveness of using discourse information for identifying paraphrases. We achieve 93.1% and 93.4% accuracy, respectively by using a single SVM classifier and by using a maximal voting model.