Algorithms for clustering data
Algorithms for clustering data
Categorizing Web pages on the subject of neural networks
Journal of Network and Computer Applications
Semi-supervised hierarchical clustering algorithms
SCAI '97 Proceedings of the sixth Scandinavian conference on Artificial intelligence
Constructing, organizing, and visualizing collections of topically related Web resources
ACM Transactions on Computer-Human Interaction (TOCHI)
Information Retrieval
Allocating Data Objects to Multiple Sites for Fast Browsing of Hypermedia Documents
COMPSAC '98 Proceedings of the 22nd International Computer Software and Applications Conference
Modeling and Querying Structure and Contents of the Web
DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
VISVIP: 3D Visualization of Paths through Web Sites
DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
Integration of Semistructured Data with Partial and Inconsistent Information
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Hi-index | 0.00 |
Most of the existing techniques for characterization of Web documents are based on term-frequency analysis. In such models, given a set of documents, the characterization of each document is represented by a feature vector in a vector space. However, as Web documents written in HTML are semi-structured documents by means of tags, the traditional techniques that assign term weights only by the frequency of occurrence may not be able to provide satisfactory results in representing the contents of such documents. Some recent studies have shown that the fuzzy representation (FR) of WWW information based on significance of HTML tag is an effective alternative for characterizing Web documents. In this paper, the FR to generate the feature vector for each Web document and the Hierarchical Agglomerative Clustering (HAC) algorithm are applied to investigate the efficiency and effectiveness for automatic categorization of Web documents with similar contents. Experiments conducted suggest several benefits of using such an approach.