A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Temporal summaries of new topics
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Proceedings of the 11th international conference on World Wide Web
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Using temporal profiles of queries for precision prediction
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Tracking Information Epidemics in Blogspace
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Similarity measures for tracking information flow
Proceedings of the 14th ACM international conference on Information and knowledge management
Information Extraction: Distilling Structured Data from Unstructured Text
Queue - Social Computing
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A translation model for sentence retrieval
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Web projections: learning from contextual subgraphs of the web
Proceedings of the 16th international conference on World Wide Web
A comparison of sentence retrieval techniques
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Finding high-quality content in social media
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Genealogical trees on the web: a search engine user perspective
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Utilizing passage-based language models for document retrieval
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Detecting the origin of text segments efficiently
Proceedings of the 18th international conference on World wide web
SBotMiner: large scale search bot detection
Proceedings of the third ACM international conference on Web search and data mining
Proceedings of the 19th international conference on World wide web
Efficient partial-duplicate detection based on sequence matching
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating text reuse discovery on the web
Proceedings of the third symposium on Information interaction in context
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hypergeometric language models for republished article finding
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Candidate document retrieval for web-scale text reuse detection
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
Noise robust detection of the emergence and spread of topics on the web
Proceedings of the 2nd Temporal Web Analytics Workshop
Detecting quilted web pages at scale
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Generating queries from user-selected text
Proceedings of the 4th Information Interaction in Context Symposium
University_of_Sheffield: two approaches to semantic text similarity
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Text reuse with ACL: (upward) trends
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Computing similarity between items in a digital library of cultural heritage
Journal on Computing and Cultural Heritage (JOCCH)
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Folktale classification using learning to rank
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Synthetic review spamming and defense
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient estimation for high similarities using odd sketches
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
With the overwhelming number of reports on similar events originating from different sources on the web, it is often hard, using existing web search paradigms, to find the original source of "facts", statements, rumors, and opinions, and to track their development. Several techniques have been previously proposed for detecting such text reuse between different sources, however these techniques have been tested against relatively small and homogeneous TREC collections. In this work, we test the feasibility of text reuse detection techniques in the setting of web search. In addition to text reuse detection, we develop a novel technique that addresses the unique challenges of finding original sources on the web, such as defining a timeline. We also explore the use of link analysis for identifying reliable and relevant reports. Our experimental results show that the proposed techniques can operate on the scale of the web, are significantly more accurate than standard web search for finding text reuse, and provide a richer representation for tracking the information flow.