Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
MADCOW: a multimedia digital annotation system
Proceedings of the working conference on Advanced visual interfaces
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Storing and retrieving multimedia web notes
International Journal of Computational Science and Engineering
SparTag.us: a low cost tagging system for foraging of web content
AVI '08 Proceedings of the working conference on Advanced visual interfaces
Storing and retrieving multimedia web notes
DNIS'05 Proceedings of the 4th international conference on Databases in Networked Information Systems
Hi-index | 0.00 |
Digital annotation of web pages presents new problems connected to the dynamics and the openness of the web. First, the variety of available browsers may require the use of proprietary solutions, once one goes beyond traditional interaction with hyperlinks. Second, documents are replicated over multiple sites and can be retrieved at different URLs or with different queries. Hence, annotations on a web content must be retrieved even if they were created while accessing the same content under a different URL. Moreover, when pages are modified, annotations related to fragments that have moved within the page itself should be retrieved and shown to the user. We have improved the MADCOW annotation system with a uniform interaction paradigm and incorporated two algorithms: one to assess the identity of two pages under two different URLs and the other to identify variations in two versions of a page under the same URL.