Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
On the use of information retrieval techniques for the automatic construction of hypertext
Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Document overlap detection system for distributed digital libraries
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Inter-linker consistency in the manual construction of hypertext documents
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
Signature extraction for overlap detection in documents
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Exploring a digital library through key ideas
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Exploring a digital library through key ideas
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories
Efficient overlap and content reuse detection in blogs and online news articles
Proceedings of the 18th international conference on World wide web
Collecting fragmentary authors in a digital library
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
When printed hypertexts go digital: information extraction from the parsing of indices
Proceedings of the 20th ACM conference on Hypertext and hypermedia
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Highlighting disputed claims on the web
Proceedings of the 19th international conference on World wide web
Efficient partial-duplicate detection based on sequence matching
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Automatic generation of inter-passage links based on semantic similarity
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Hypergeometric language models for republished article finding
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
Finding and exploring memes in social media
Proceedings of the 23rd ACM conference on Hypertext and social media
Detecting quilted web pages at scale
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Scanning books, magazines, and newspapers has become a widespread activity because people believe that much of the worlds information still resides off-line. In general after works are scanned they are indexed for search and processed to add links. This paper describes a new approach to automatically add links by mining popularly quoted passages. Our technique connects elements that are semantically rich, so strong relations are made. Moreover, link targets point within a work, facilitating navigation. This paper makes three contributions. We describe a scalable algorithm for mining repeated word sequences from extremely large text corpora. Second, we present techniques that filter and rank the repeated sequences for quotations. Third, we present a new user interface for navigating across and within works in the collection using quotation links. Our system has been run on a digital library of over 1 million books and has been used by thousands of people.