Multimodal surrogates for video browsing
Proceedings of the fourth ACM conference on Digital libraries
Using thumbnails to search the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A brief survey of web data extraction tools
ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
How fast is too fast?: evaluating fast forward surrogates for digital video
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Hierarchical clustering of WWW image search results using visual, textual and link information
Proceedings of the 12th annual ACM international conference on Multimedia
A bootstrapping framework for annotating and retrieving WWW images
Proceedings of the 12th annual ACM international conference on Multimedia
Learning important models for web page blocks based on layout and content analysis
ACM SIGKDD Explorations Newsletter
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
A survey of content-based image retrieval with high-level semantics
Pattern Recognition
Clustering and searching WWW images using link and page layout analysis
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Elimination of junk document surrogate candidates through pattern recognition
Proceedings of the 2007 ACM symposium on Document engineering
Proceedings of the 15th international conference on Multimedia
Improving relevance judgment of web search results with image excerpts
Proceedings of the 17th international conference on World Wide Web
Seeking information in realistic books: a user study
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
ACM Transactions on Information Systems (TOIS)
Test collection management and labeling system
Proceedings of the 9th ACM symposium on Document engineering
Meta-metadata: a metadata semantics language for collection representation applications
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The representation of information collections needs to be optimized for human cognition. While documents often include rich visual components, collections, including personal collections and those generated by search engines, are typically represented by lists of text-only surrogates. By concurrently invoking complementary components of human cognition, combined image-text surrogates will help people to more effectively see, understand, think about, and remember an information collection. This research develops algorithmic methods that use the structural context of images in HTML documents to associate meaningful text and thus derive combined image-text surrogates. Our algorithm first recognizes which documents consist essentially of informative and multimedia content. Then, the algorithm recognizes the informative sub-trees within each such document, discards advertisements and navigation, and extracts images with contextual descriptions. Experimental results demonstrate the algorithm's efficacy. An implementation of the algorithm is provided in combinFormation, a creativity support tool for collection authoring. The enhanced image-text surrogates enhance the experiences of users finding and collecting information as part of developing new ideas.