Improving automatic query expansion
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Building a question answering test collection
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Expansion of multi-word terms for indexing and retrieval using morphology and syntax
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Lexical query paraphrasing for document retrieval
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Extracting paraphrases from a parallel corpus
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Paraphrase acquisition for information extraction
PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Corpus and evaluation measures for multiple document summarization with multiple sources
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic paraphrase acquisition from news articles
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Gram-free synonym extraction via suffix arrays
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Paraphrasing with search engine query logs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
An empirical evaluation of data-driven paraphrase generation techniques
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Mavuno: a scalable and effective Hadoop-based paraphrase acquisition system
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Paraphrase identification on the basis of supervised machine learning techniques
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Unsupervised mining of lexical variants from noisy text
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Diversity-aware evaluation for paraphrase patterns
TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Reranking bilingually extracted paraphrases using monolingual distributional similarity
GEMS '11 Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics
ATLAS - human language technologies integrated within a multilingual web content management system
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Automatically mining question reformulation patterns from search log data
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Enlarging paraphrase collections through generalization and instantiation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Generalizing sub-sentential paraphrase acquisition across original signal type of text pairs
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Ensemble semantics for large-scale unsupervised relation extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Distributional phrasal paraphrase generation for statistical machine translation
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Multitechnique paraphrase alignment: A contribution to pinpointing sub-sentential paraphrases
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Hi-index | 0.01 |
This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method differs from previous approaches to paraphrase acquisition in that 1) it removes the assumptions on the quality of the input data, by using inherently noisy, unreliable Web documents rather than clean, trustworthy, properly formatted documents; and 2) it does not require any explicit clue indicating which documents are likely to encode parallel paraphrases, as they report on the same events or describe the same stories. Large sets of paraphrases are collected through exhaustive pairwise alignment of small needles, i.e., sentence fragments, across a haystack of Web document sentences. The paper describes experiments on a set of about one billion Web documents, and evaluates the extracted paraphrases in a natural-language Web search application.