Text Clustering by 2D Cellular Automata Based on the N-Grams

Authors:
Reda Mohamed Hamou;Ahmed Lehireche;Ahmed Chaouki Lokbani;Mohamed Rahmani
Affiliations:
-;-;-;-
Venue:
CDEE '10 Proceedings of the 2010 First ACIS International Symposium on Cryptography, and Network Security, Data Mining and Knowledge Discovery, E-Commerce and Its Applications, and Embedded Systems
Year:
2010

Citing 0
Cited 2

The Social Spiders in the Clustering of Texts: Towards an Aspect of Visual Classification

International Journal of Artificial Life Research
The Impact of the Mode of Data Representation for the Result Quality of the Detection and Filtering of Spam

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by the approach of N-grams. The cellular automaton that we propose in this paper is a grid cell structure with a flat neighborhood arising from this structure (planar). Three functions of transitions were used to vary the automaton with four states for each cell. The results obtained show that the virtual machine parallel computing (Class_AC) effectively includes similar documents on near threshold. Section 1 gives an introduction, Section 2 presents representation of texts based on the n grams, Section 3 describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5 gives a conclusion and perspectives.