Classifying XML tags through "reading contexts"

  • Authors:
  • Xavier Tannier;Jean-Jacques Girardot;Mihaela Mathieu

  • Affiliations:
  • École Nationale Supérieure des Mines, France;École Nationale Supérieure des Mines, France;École Nationale Supérieure des Mines, France

  • Venue:
  • Proceedings of the 2005 ACM symposium on Document engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Some tags used in XML documents create arbitrary breaks in the natural flow of the text. This may constitute an impediment to the application of some methods of document engineering. This article introduces the concept of ``reading contexts'', and gives clues to handle it theorically and in practice. This work should notably allow to recognize emphasis tags in a text, to define a new concept of term proximity in structured documents, to improve indexing techniques, and also to open up the way to advanced linguistic analyses of XML corpora.