Design patterns: elements of reusable object-oriented software
Design patterns: elements of reusable object-oriented software
A typed text retrieval query language for XML documents
Journal of the American Society for Information Science and Technology - XML
Classifying XML tags through "reading contexts"
Proceedings of the 2005 ACM symposium on Document engineering
A document engineering environment for clinical guidelines
Proceedings of the 2007 ACM symposium on Document engineering
Elimination of junk document surrogate candidates through pattern recognition
Proceedings of the 2007 ACM symposium on Document engineering
Structure and content analysis for html medical articles: a hidden markov model approach
Proceedings of the 2007 ACM symposium on Document engineering
Content Ontology Design Patterns as Practical Building Blocks for Web Ontologies
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Proceedings of the 9th ACM symposium on Document engineering
Structural patterns for descriptive documents
ICWE'07 Proceedings of the 7th international conference on Web engineering
An efficient language-independent method to extract content from news webpages
Proceedings of the 11th ACM symposium on Document engineering
Faceted documents: describing document characteristics using semantic lenses
Proceedings of the 2012 ACM symposium on Document engineering
Recognising document components in XML-based academic articles
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
XML is among the preferred formats for storing the structure of documents such as scientific articles, manuals, documentation, literary works, etc. Sometimes publishers adopt established and well-known vocabularies such as DocBook and TEI, other times they create partially or entirely new ones that better deal with the particular requirements of their documents. The (explicit and implicit) requirements of use in these vocabularies often follow well-established patterns, creating meta-structures (the block, the container, the inline element, etc.) that persist across vocabularies and authors and that describe a truer and more general conceptualization of the documents' building blocks. Addressing such meta-structures not only gives a better insight of what documents really are composed of, but provides abstract and more general mechanisms to work on documents regardless of the availability of specific schemas, tools and presentation stylesheets. In this paper we introduce a schemaindependent theory based on eleven structural patterns. We provide a definition of such patterns and how they synthesize characteristics emerging from real markup documents. Additionally, we propose an algorithm that allows us to identify the pattern of each element in a set of homogeneous markup documents.