On the image content of a web segment: Chile as a case study

Authors:
A. Jaimes;J. Ruiz-Del-Solar;R. Verschae;R. Baeza-Yates;C. Castillo;D. Yaksic;E. Davis
Affiliations:
Center for Web Research, Department of Computer Science, Universidad de Chile, Chile;Department of Electrical Engineering, Universidad de Chile, Chile;Center for Web Research, Department of Computer Science, Universidad de Chile, Chile and Department of Electrical Engineering, Universidad de Chile, Chile;Center for Web Research, Department of Computer Science, Universidad de Chile, Chile;Center for Web Research, Department of Computer Science, Universidad de Chile, Chile;Center for Web Research, Department of Computer Science, Universidad de Chile, Chile and Department of Electrical Engineering, Universidad de Chile, Chile;Center for Web Research, Department of Computer Science, Universidad de Chile, Chile
Venue:
Journal of Web Engineering
Year:
2004

Citing 9
Cited 2

The image processing handbook (3rd ed.)

The image processing handbook (3rd ed.)
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
WebSeer: An Image Search Engine for the World Wide Web

WebSeer: An Image Search Engine for the World Wide Web
On the Image Content of the Chilean Web

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Conceptual structures and computational methods for indexing and organization of visual information

Conceptual structures and computational methods for indexing and organization of visual information
The state of the art in image and video retrieval

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
A hybrid face detector based on an asymmetrical adaboost cascade detector and a wavelet-Bayesian-detector

IWANN'03 Proceedings of the Artificial and natural neural networks 7th international conference on Computational methods in neural modeling - Volume 1
Skin detection using neighborhood information

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition

Effective web crawling

ACM SIGIR Forum
Characterization of national Web domains

ACM Transactions on Internet Technology (TOIT)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a methodology to characterize the image contents of a web segment, and we present an analysis of the contents of a segment of the Chilean web (.CL domain). Our framework uses an efficient web-crawling architecture, standard content-based analysis tools (to extract low-level features such as color, shape and texture), and novel skin and face detection algorithms. In an automated process we start by examining all websites within a domain (e.g., .cl websites), obtaining links to images, and downloading a large number of the images (in all of our experiments approx. 383,000 images that correspond to about 35 billion pixels). Once the images are downloaded to a local server, our process automatically extracts several low-level visual features (color, texture, shape, etc.). Using novel algorithms we perform skin and face detection. The results of visual feature extraction, skin, and face detection are then used to characterize the contents of a web segment. We tested our methodology on a segment of the Chilean web (.cl), by automatically downloading and processing 183,000 images in 2003 and 200,000 images in 2004. We present some statistics derived from both sets of images, which should be of use to anyone concerned with the image content of the web in Chile. Our study is the first one to use content-based tools to determine the image contents of a given web segment.