Concept-based clustering of textual documents using SOM

Authors:
Abdelmalek Amine;Zakaria Elberrichi;Ladjel Bellatreche;Michel Simonet;Mimoun Malki
Affiliations:
EEDIS Laboratory, Department of computer science, Djillali Liabes University, Sidi Belabbes - Algeria;EEDIS Laboratory, Department of computer science, Djillali Liabes University, Sidi Belabbes - Algeria;LISI/ENSMA University of Poitiers, Futuroscope 86960 France;TIMC-IMAG Laboratory, IN3S, University Joseph Fourier, Grenoble - France;EEDIS Laboratory, Department of computer science, Djillali Liabes University, Sidi Belabbes - Algeria
Venue:
AICCSA '08 Proceedings of the 2008 IEEE/ACS International Conference on Computer Systems and Applications
Year:
2008

Citing 0
Cited 3

Towards design principles for effective context- and perspective-based web mining

Proceedings of the 4th International Conference on Design Science Research in Information Systems and Technology
Visual integration tool for heterogeneous data type by unified vectorization

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Dynamically generating context-relevant sub-webs

DESRIST'10 Proceedings of the 5th international conference on Global Perspectives on Design Science Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The classification of textual documents has been widely studied. The majority of classification approaches use supervised learning methods, which are acceptable for rather small corpora allowing experts to generate representative sets of data for the training, but are not feasible for significant flows of data. Unsupervised classification methods discover latent (hidden) classes automatically while minimizing human intervention. Many such methods exist, among which Kohonen self-organizing maps (SOM), which gather a certain number of similar objects without prior information. In this paper, we evaluate and compare the use of SOMs for the classification of textual documents in two situations: a conceptual representation of texts and a representation based on n-grams.