Towards design principles for effective context- and perspective-based web mining
Proceedings of the 4th International Conference on Design Science Research in Information Systems and Technology
Visual integration tool for heterogeneous data type by unified vectorization
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Dynamically generating context-relevant sub-webs
DESRIST'10 Proceedings of the 5th international conference on Global Perspectives on Design Science Research
Hi-index | 0.00 |
The classification of textual documents has been widely studied. The majority of classification approaches use supervised learning methods, which are acceptable for rather small corpora allowing experts to generate representative sets of data for the training, but are not feasible for significant flows of data. Unsupervised classification methods discover latent (hidden) classes automatically while minimizing human intervention. Many such methods exist, among which Kohonen self-organizing maps (SOM), which gather a certain number of similar objects without prior information. In this paper, we evaluate and compare the use of SOMs for the classification of textual documents in two situations: a conceptual representation of texts and a representation based on n-grams.