Information seeking in electronic environments
Information seeking in electronic environments
Hypertext paths and the World-Wide Web: experiences with Walden's Paths
HYPERTEXT '97 Proceedings of the eighth ACM conference on Hypertext
Principles of mixed-initiative user interfaces
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimodal surrogates for video browsing
Proceedings of the fourth ACM conference on Digital libraries
Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
New technology and new roles: the need for “corpus editors”
DL '00 Proceedings of the fifth ACM conference on Digital libraries
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
Building a hypertextual digital library in the humanities: a case study on London
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Journal of the American Society for Information Science and Technology
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Automatic removal of advertising from web-page display
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
How fast is too fast?: evaluating fast forward surrogates for digital video
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Human + agent: creating recombinant information
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
CS AKTive space: representing computer science in the semantic web
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
The information discovery framework
DIS '04 Proceedings of the 5th conference on Designing interactive systems: processes, practices, methods, and techniques
Evaluating navigational surrogate formats with divergent browsing tasks
CHI '05 Extended Abstracts on Human Factors in Computing Systems
Deriving image-text document surrogates to optimize cognition
Proceedings of the 9th ACM symposium on Document engineering
A first approach to the automatic recognition of structural patterns in XML documents
Proceedings of the 2012 ACM symposium on Document engineering
Hi-index | 0.00 |
A surrogate is an object that stands for a document and enables navigation to that document. Hypermedia is often represented with textual surrogates, even though studies have shown that image and text surrogates facilitate the formation of mental models and overall understanding. Surrogates may be formed by breaking a document down into a set of smaller elements, each of which is a surrogate candidate. While processing these surrogate candidates from an HTML document, relevant information may appear together with less useful junk material, such as navigation bars and advertisements. This paper develops a pattern recognition based approach for eliminating junk while building the set of surrogate candidates. The approach defines features on candidate elements, and uses classification algorithms to make selection decisions based on these features. For the purpose of defining features in surrogate candidates, we introduce the Document Surrogate Model (DSM), a streamlined Document Object Model (DOM)-like representation of semantic structure. Using a quadratic classifier, we were able to eliminate junk surrogate candidates with an average classification rate of 80%. By using this technique, semiautonomous agents can be developed to more effectively generate surrogate collections for users. We end by describing a new approach for hypermedia and the semantic web, which uses the DSM to define value-added surrogates for a document.