Methods for the semantic analysis of document markup

Authors:
Petra Saskia Bayerl;Harald Lüngen;Daniela Goecke;Andreas Witt;Daniel Naber
Affiliations:
Justus-Liebig-Universität, Gießen, Germany;Justus-Liebig-Universität, Gießen, Germany;Universität Bielefeld, Bielefeld, Germany;Universität Bielefeld, Bielefeld, Germany;Universität Bielefeld, Bielefeld, Germany
Venue:
Proceedings of the 2003 ACM symposium on Document engineering
Year:
2003

Citing 11
Cited 6

TEXTNET: a network-based approach to text handling

ACM Transactions on Information Systems (TOIS)
The discourse-level structure of empirical abstracts: an exploratory study

Information Processing and Management: an International Journal
DocBook: The Definitive Guide with CD-ROM

DocBook: The Definitive Guide with CD-ROM
Towards a semantics for XML markup

Proceedings of the 2002 ACM symposium on Document engineering
Pro-SGML: Ein Prolog-basiertes System zum Textretrieval

Linguistik und neue Medien [10. Jahrestagung der GLDV
Lightweight structure in text

Lightweight structure in text
Identifying topics by position

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
An annotation scheme for discourse-level argumentation in research articles

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
RSTTool 2.4: a markup tool for Rhetorical Structure Theory

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Text-level structure of research papers: implications for text-based information processing systems

IRSG'97 Proceedings of the 19th Annual BCS-IRSG conference on Information Retrieval Research

A document engineering environment for clinical guidelines

Proceedings of the 2007 ACM symposium on Document engineering
Text type structure and logical document structure

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
Multidimensional markup and heterogeneous linguistic resources

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
Practice theory & the foundations of digital document encoding

Proceedings of the 27th ACM international conference on Design of communication
Web-based annotation of anaphoric relations and lexical chains

LAW '07 Proceedings of the Linguistic Annotation Workshop
Requirements and an architecture for a multimedia content re-purposing framework

EC-TEL'06 Proceedings of the First European conference on Technology Enhanced Learning: innovative Approaches for Learning and Knowledge Sharing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach on how to investigate what kind of semantic information is regularly associated with the structural markup of scientific articles. This approach addresses the need for an explicit formal description of the semantics of text-oriented XML-documents. The domain of our investigation is a corpus of scientific articles from psychology and linguistics from both English and German online available journals.For our analyses, we provide XML-markup representing two kinds of semantic levels: the thematic level (i.e.\ topics in the text world that the article is about) and the functional or rhetorical level. Our hypothesis is that these semantic levels correlate with the articles' document structure also represented in XML. Articles have been annotated with the appropriate information. Each of the three informational levels is modelled in a separate XML document, since in our domain, the different description levels might conflict so that it is impossible to model them within a single XML document.For comparing and mining the resulting multi-layered\linebreak XML annotations of one article, a Prolog-based approach is used. It focusses on the comparison of XML markup that is distributed among different documents. Prolog predicates have been defined for inferring relations between levels of information that are modelled in separate XML documents. We demonstrate how the Prolog tool is applied in our corpus analyses.