Clustering and searching WWW images using link and page layout analysis

Authors:
Xiaofei He;Deng Cai;Ji-Rong Wen;Wei-Ying Ma;Hong-Jiang Zhang
Affiliations:
Yahoo! Research Labs, Burbank, CA;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois;Microsoft Research Asia, Beijing;Microsoft Research Asia, Beijing;Microsoft Research Asia, Beijing
Venue:
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Year:
2007

Citing 17
Cited 15

VisualSEEk: a fully automated content-based image query system

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
NeTra: a toolbox for navigating large image databases

Multimedia Systems - Special issue on video content based retrieval
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Graph Embeddings and Laplacian Eigenvalues

SIAM Journal on Matrix Analysis and Applications
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
PicASHOW: pictorial authority search by hyperlinks on the Web

Proceedings of the 10th international conference on World Wide Web
ImageRover: A Content-Based Image Browser for the World Wide Web

CAIVL '97 Proceedings of the 1997 Workshop on Content-Based Access of Image and Video Libraries (CBAIVL '97)
Texture Features and Learning Similarity

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
WebSeer: An Image Search Engine for the World Wide Web

WebSeer: An Image Search Engine for the World Wide Web
Learning block importance models for web pages

Proceedings of the 13th international conference on World Wide Web
Block-level link analysis

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Block-based web search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Face Recognition Using Laplacianfaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Spectral clustering for German verbs

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Extracting content structure for web pages based on visual representation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Relevance feedback: a power tool for interactive content-based image retrieval

IEEE Transactions on Circuits and Systems for Video Technology

Contextual in-image advertising

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Deriving image-text document surrogates to optimize cognition

Proceedings of the 9th ACM symposium on Document engineering
Webpage segmentation for extracting images and their surrounding contextual information

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Subspace learning-based dimensionality reduction in building recognition

Neurocomputing
A user study to investigate semantically relevant contextual information of WWW images

International Journal of Human-Computer Studies
'Oh web image, where art thou?'

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Measuring performance of web image context extraction

Proceedings of the Tenth International Workshop on Multimedia Data Mining
GameSense: game-like in-image advertising

Multimedia Tools and Applications
Identifying persons in news article images based on textual analysis

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Semantic analysis and retrieval in personal and social photo collections

Multimedia Tools and Applications
A versatile model for web page representation, information extraction and content re-packaging

Proceedings of the 11th ACM symposium on Document engineering
ImageSense: Towards contextual image advertising

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Image indexing based on web page segmentation and clustering

ACA'12 Proceedings of the 11th international conference on Applications of Electrical and Computer Engineering
Combination of document structure and links for multimedia object retrieval

Journal of Information Science
Search web images using objects, backgrounds and conditions

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for an effective and efficient method for organizing and retrieving the available images. This article describes iFind, a system for clustering and searching WWW images. By using a vision-based page segmentation algorithm, a Web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. The textual information is used for image indexing. By extracting the page-to-block, block-to-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. Our method is less sensitive to noisy links than previous methods like PageRank, HITS, and PicASHOW, and hence the image graph can better reflect the semantic relationship between images. Using the notion of Markov Chain, we can compute the limiting probability distributions of the images, ImageRanks, which characterize the importance of the images. The ImageRanks are combined with the relevance scores to produce the final ranking for image search. With the graph models, we can also use techniques from spectral graph theory for image clustering and embedding, or 2-D visualization. Some experimental results on 11.6 million images downloaded from the Web are provided in the article.