A first approach to the automatic recognition of structural patterns in XML documents
Proceedings of the 2012 ACM symposium on Document engineering
Hi-index | 0.00 |
Some tags used in XML documents create arbitrary breaks in the natural flow of the text. This may constitute an impediment to the application of some methods of document engineering. This article introduces the concept of ``reading contexts'', and gives clues to handle it theorically and in practice. This work should notably allow to recognize emphasis tags in a text, to define a new concept of term proximity in structured documents, to improve indexing techniques, and also to open up the way to advanced linguistic analyses of XML corpora.