Deriving image-text document surrogates to optimize cognition

Authors:
Eunyee Koh;Andruid Kerne
Affiliations:
Adobe Systems Inc, San Jose, CA, USA;Texas A&M University, College Station, TX, USA
Venue:
Proceedings of the 9th ACM symposium on Document engineering
Year:
2009

Citing 28
Cited 1

Multimodal surrogates for video browsing

Proceedings of the fourth ACM conference on Digital libraries
Using thumbnails to search the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Function-based object model towards website adaptation

Proceedings of the 10th international conference on World Wide Web
IEPAD: information extraction based on pattern discovery

Proceedings of the 10th international conference on World Wide Web
A brief survey of web data extraction tools

ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Discovering informative content blocks from Web documents

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving pseudo-relevance feedback in web information retrieval using web page segmentation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
How fast is too fast?: evaluating fast forward surrogates for digital video

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Eliminating noisy information in Web pages for data mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Hierarchical clustering of WWW image search results using visual, textual and link information

Proceedings of the 12th annual ACM international conference on Multimedia
A bootstrapping framework for annotating and retrieving WWW images

Proceedings of the 12th annual ACM international conference on Multimedia
Learning important models for web page blocks based on layout and content analysis

ACM SIGKDD Explorations Newsletter
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
A survey of content-based image retrieval with high-level semantics

Pattern Recognition
Clustering and searching WWW images using link and page layout analysis

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Guest Editors' Introduction: Human-Centered Computing--Toward a Human Revolution

Computer
Mining templates from search result records of search engines

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Elimination of junk document surrogate candidates through pattern recognition

Proceedings of the 2007 ACM symposium on Document engineering
Generating views of the buzz: browsing popular media and authoring using mixed-initiative composition

Proceedings of the 15th international conference on Multimedia
Improving relevance judgment of web search results with image excerpts

Proceedings of the 17th international conference on World Wide Web
Seeking information in realistic books: a user study

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
combinFormation: Mixed-initiative composition of image and text surrogates promotes information discovery

ACM Transactions on Information Systems (TOIS)
Test collection management and labeling system

Proceedings of the 9th ACM symposium on Document engineering

Meta-metadata: a metadata semantics language for collection representation applications

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The representation of information collections needs to be optimized for human cognition. While documents often include rich visual components, collections, including personal collections and those generated by search engines, are typically represented by lists of text-only surrogates. By concurrently invoking complementary components of human cognition, combined image-text surrogates will help people to more effectively see, understand, think about, and remember an information collection. This research develops algorithmic methods that use the structural context of images in HTML documents to associate meaningful text and thus derive combined image-text surrogates. Our algorithm first recognizes which documents consist essentially of informative and multimedia content. Then, the algorithm recognizes the informative sub-trees within each such document, discards advertisements and navigation, and extracts images with contextual descriptions. Experimental results demonstrate the algorithm's efficacy. An implementation of the algorithm is provided in combinFormation, a creativity support tool for collection authoring. The enhanced image-text surrogates enhance the experiences of users finding and collecting information as part of developing new ideas.