What about sequential data mining techniques to identify linguistic patterns for stylistics?

  • Authors:
  • Solen Quiniou;Peggy Cellier;Thierry Charnois;Dominique Legallois

  • Affiliations:
  • GREYC Université de Caen Basse-Normandie, Caen, France and CRISCO Université de Caen Basse-Normandie, Caen, France;IRISA-INSA de Rennes, Rennes Cedex, France;GREYC Université de Caen Basse-Normandie, Caen, France;CRISCO Université de Caen Basse-Normandie, Caen, France

  • Venue:
  • CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study the use of data mining techniques for stylistic analysis, from a linguistic point of view, by considering emerging sequential patterns. First, we show that mining sequential patterns of words with gap constraints gives new relevant linguistic patterns with respect to patterns built on n-grams. Then, we investigate how sequential patterns of itemsets can provide more generic linguistic patterns. We validate our approach from a linguistic point of view by conducting experiments on three corpora of various types of French texts (Poetry, Letters, and Fiction). By considering more particularly poetic texts, we show that characteristic linguistic patterns can be identified using data mining techniques. We also discuss how to improve our proposed approach so that it can be used more efficiently for linguistic analyses.