Data mining for grammatical inference with bioinformatics criteria

  • Authors:
  • Vivian F. López;Ramiro Aguilar;Luis Alonso;María N. Moreno

  • Affiliations:
  • Departament Informática y Automática, University of Salamanca, Plaza de la Merced S/N, 37008 Salamanca, Spain;Departament Informática y Automática, University of Salamanca, Plaza de la Merced S/N, 37008 Salamanca, Spain;Departament Informática y Automática, University of Salamanca, Plaza de la Merced S/N, 37008 Salamanca, Spain;Departament Informática y Automática, University of Salamanca, Plaza de la Merced S/N, 37008 Salamanca, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

In this work a novel data mining process is described that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics, to generate grammatical structures of a specific language. Subsequently, these structures are converted to Context-Free Grammars. Initially the method applies to context-free languages with the possibility of being applied to other languages: structured programming, the language of the book of life expressed in the genome and proteome and even the natural languages. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, like bioinformatic. The tool allows measuring the complexity of the obtained grammar automatically from textual data.