Do-I-Care: a collaborative Web agent
Conference Companion on Human Factors in Computing Systems
Hypertext paths and the World-Wide Web: experiences with Walden's Paths
HYPERTEXT '97 Proceedings of the eighth ACM conference on Hypertext
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Electronic document addressing: dealing with change
ACM Computing Surveys (CSUR)
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Keeping found things found on the web
Proceedings of the tenth international conference on Information and knowledge management
Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
ACM SIGKDD Explorations Newsletter
Managing distributed collections: evaluating web page changes, movement, and replacement
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
What are the Characteristics of Digital Genres? - Genre Theory from a Multi-Modal Perspective
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Evolution of web site design patterns
ACM Transactions on Information Systems (TOIS)
Effects of web document evolution on genre classification
Proceedings of the 14th ACM international conference on Information and knowledge management
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Longitudinal study of changes in blogs
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis
IEEE Transactions on Knowledge and Data Engineering
Application of kalman filters to identify unexpected change in blogs
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Archiving the web using page changes patterns: a case study
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
An analysis of personal collections among users of social media
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Hi-index | 0.00 |
While most digital collections have limited forms of change--primarily creation and deletion of additional resources--there exists a class of digital collections that undergoes additional kinds of change. These collections are made up of resources that are distributed across the Internet and brought together into a collection via hyperlinking. Resources in these collections can be expected to change as time goes on. Part of the difficulty in maintaining these collections is determining if a changed page is still a valid member of the collection. Others have tried to address this problem by measuring change and defining a maximum allowed threshold of change, however, these methods treat all change as a potential problem and treat web content as a static document despite its intrinsically dynamic nature. Instead, we approach the significance of change on the web as a normal part of a web document's life-cycle and determine the difference between what a maintainer expects a page to do and what it actually does. In this work we evaluate the different options for extractors and analyzers in order to determine the best options from a suite of techniques. The evaluation used a human-generated ground-truth set of blog changes. The results of this work showed a statistically significant improvement over a range of traditional threshold techniques when applied to our collection of tagged blog changes.