Unsupervised clustering for nontextual web document classification

Authors:
Samuel W. K. Chan;Mickey W. C. Chong
Affiliations:
Department of Decision Sciences and Managerial Economics, The Chinese University of Hong Kong, Hong Kong, China;Department of Decision Sciences and Managerial Economics, The Chinese University of Hong Kong, Hong Kong, China
Venue:
Decision Support Systems
Year:
2004

Citing 12
Cited 4

Physiological interpretation of the Self-Organizing Map algorithm

Neural Networks
Fast multiresolution image querying

SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
VisualSEEk: a fully automated content-based image query system

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
A graphical, self-organizing approach to classifying electronic meeting output

Journal of the American Society for Information Science
A New Metric for Grey-Scale Image Comparison

International Journal of Computer Vision
A texture thesaurus for browsing large aerial photographs

Journal of the American Society for Information Science - Special topic issue: artificial intelligence techniques for emerging information systems applications
Image retrieval by color semantics

Multimedia Systems - Special issue on video content based retrieval
Validating a geographical image retrieval system

Journal of the American Society for Information Science
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Query by Image and Video Content: The QBIC System

Computer
Indexing Flower Patent Images Using Domain Knowledge

IEEE Intelligent Systems
Fast image retrieval using color-spatial information

The VLDB Journal — The International Journal on Very Large Data Bases

Beyond keyword and cue-phrase matching: a sentence-based abstraction technique for information extraction

Decision Support Systems
A neural clustering and classification system for sales forecasting of new apparel items

Applied Soft Computing
An intelligent information agent for document title classification and filtering in document-intensive domains

Decision Support Systems
Efficient clustering of databases induced by local patterns

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

While the breath of vocabulary used in long documents may mislead the traditional keyword-based retrieval systems, the demands for techniques in nontextual Web classification and retrieval from a large document collection are mounting. Only a few prototype systems have attempted to classify hypertext on the basis of nontextual elements in order to locate unfamiliar documents. As a result, a large portion of Web documents having pictorial information in nature is far beyond the reach of most current search engines. In this research, we devise a novel quantitative model of nontextual World Wide Web (WWW) classification based on image information. An intelligent content-sensitive, attribute-rich image classifier is presented. An image similarity measure is used to deduce the likelihood among images. Different image feature vectors have been constructed and evaluated. Evaluation shows images judged to be similar by human form interesting clusters in our unsupervised learning. Comparison with other clustering technique, such as Hierarchical Agglomerative Clustering (HAC), demonstrates that our approach is found useful in content-based image information retrieval.