Text Clustering by 2D Cellular Automata Based on the N-Grams

  • Authors:
  • Reda Mohamed Hamou;Ahmed Lehireche;Ahmed Chaouki Lokbani;Mohamed Rahmani

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CDEE '10 Proceedings of the 2010 First ACIS International Symposium on Cryptography, and Network Security, Data Mining and Knowledge Discovery, E-Commerce and Its Applications, and Embedded Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by the approach of N-grams. The cellular automaton that we propose in this paper is a grid cell structure with a flat neighborhood arising from this structure (planar). Three functions of transitions were used to vary the automaton with four states for each cell. The results obtained show that the virtual machine parallel computing (Class_AC) effectively includes similar documents on near threshold. Section 1 gives an introduction, Section 2 presents representation of texts based on the n grams, Section 3 describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5 gives a conclusion and perspectives.