Differences and identities in document retrieval in an annotation environment

Authors:
Paolo Bottoni;Michele Cuomo;Stefano Levialdi;Emanuele Panizzi;Marco Passavanti;Rossella Trinchese
Affiliations:
Department of Computer Science, University of Rome "La Sapienza", Rome, Italy;Department of Computer Science, University of Rome "La Sapienza", Rome, Italy;Department of Computer Science, University of Rome "La Sapienza", Rome, Italy;Department of Computer Science, University of Rome "La Sapienza", Rome, Italy;Department of Computer Science, University of Rome "La Sapienza", Rome, Italy;Department of Computer Science, University of Rome "La Sapienza", Rome, Italy
Venue:
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Year:
2007

Citing 8
Cited 1

Copy detection mechanisms for digital documents

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building a scalable and accurate copy detection mechanism

Proceedings of the first ACM international conference on Digital libraries
Collection statistics for fast duplicate document detection

ACM Transactions on Information Systems (TOIS)
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
MADCOW: a multimedia digital annotation system

Proceedings of the working conference on Advanced visual interfaces
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Storing and retrieving multimedia web notes

International Journal of Computational Science and Engineering
Storing and retrieving multimedia web notes

DNIS'05 Proceedings of the 4th international conference on Databases in Networked Information Systems

Managing groups and group annotations in MADCOW

DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Digital annotation of web pages presents two types of problems which are unknown to traditional annotation and which are connected to the dynamicity and the openness of the Web. The first problem is related to the possibility of replicating a document over multiple sites, so that it can be retrieved over the Web at different URLs or with different queries. This poses the need to associate to a web page all the annotations pertaining to its content, even if they were created while accessing the same content under a different URL. The second problem is related to the dynamics of individual HTML pages that often consist of insertions, deletions or movement of page segments. Annotations related to portions of the page that have moved within the page itself should be retrieved and shown to the user. To reduce the impact of these phenomena on the usefulness of the annotation process, our annotation system MADCOW incorporates two algorithms which assess the identity of two pages under two different URLs, and the differences between two versions of a page under the same URL, taking the proper actions in order to retrieve all the pertaining annotations.