Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Retrieval and novelty detection at the sentence level
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
The recap system for identifying information flow
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Next steps in near-duplicate detection for eRulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Near-duplicate detection by instance-level constrained clustering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An approach to evaluate policy similarity
Proceedings of the 12th ACM symposium on Access control models and technologies
A comparison of sentence retrieval techniques
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts
Proceedings of the 15th international conference on Multimedia
Overview and semantic issues of text mining
ACM SIGMOD Record
Measuring novelty and redundancy with multiple modalities in cross-lingual broadcast news
Computer Vision and Image Understanding
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Identifying Quotations in Reference Works and Primary Materials
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
The Evaluation of Sentence Similarity Measures
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Efficient overlap and content reuse detection in blogs and online news articles
Proceedings of the 18th international conference on World wide web
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Syntactic Query Models for Restatement Retrieval
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Automatically selecting answer templates to respond to customer emails
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Exploiting Sentence-Level Features for Near-Duplicate Document Detection
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Organization and Tagging of Blog and News Entries Based on Content Reuse
Journal of Signal Processing Systems
Similarity measures for short segments of text
ECIR'07 Proceedings of the 29th European conference on IR research
Semantic similarity measures for Malay sentences
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Web news summarization via soft clustering algorithm
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating text reuse discovery on the web
Proceedings of the third symposium on Information interaction in context
An improved web information summarization based on SSSC
CAR'10 Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 3
Tracking information flow between primary and secondary news sources
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
German encyclopedia alignment based on information retrieval techniques
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Automatic detection of local reuse
EC-TEL'10 Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice
Linking online news and social media
Proceedings of the fourth ACM international conference on Web search and data mining
Fixing the threshold for effective detection of near duplicate web documents in web crawling
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents
Web Intelligence and Agent Systems
An effective approach for searching closest sentence translations from the web
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
The case of the duplicate documents measurement, search, and science
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Noise robust detection of the emergence and spread of topics on the web
Proceedings of the 2nd Temporal Web Analytics Workshop
Word length n-grams for text re-use detection
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Recognising sentence similarity using similitude and dissimilarity features
International Journal of Advanced Intelligence Paradigms
Language intent models for inferring user browsing behavior
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Learning hash codes for efficient content reuse detection
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Measuring semantic relatedness using multilingual representations
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Position-Aligned translation model for citation recommendation
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Folktale classification using learning to rank
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Unsupervised latent concept modeling to identify query facets
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Information Sciences: an International Journal
Hi-index | 0.00 |
Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity -- resulting from summarization, paraphrasing, copying, and stronger forms of topical relevance -- are useful for applications such as information flow analysis and question-answering tasks. In this paper, we explore mechanisms for measuring such intermediate kinds of similarity, focusing on the task of identifying where a particular piece of information originated. We consider both sentence-to-sentence and document-to-document comparison, and have incorporated these algorithms into RECAP, a prototype information flow analysis tool. Our experimental results with RECAP indicate that new mechanisms such as those we propose are likely to be more appropriate than existing methods for identifying the intermediate forms of similarity.