A pilot investigation of information extraction in the semantic annotation of archaeological reports

Authors:
Andreas Vlachidis;Douglas Tudhope
Affiliations:
Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, CF37 1DL, Wales, UK.;Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, CF37 1DL, Wales, UK
Venue:
International Journal of Metadata, Semantics and Ontologies
Year:
2012

Citing 14
Cited 0

Information extraction

Communications of the ACM
Natural language processing for information retrieval

Communications of the ACM
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Evolving GATE to meet new challenges in language engineering

Natural Language Engineering
Introduction: named entity recognition in biomedicine

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Information Extraction: Algorithms and Prospects in a Retrieval Context (The Information Retrieval Series)

Information Extraction: Algorithms and Prospects in a Retrieval Context (The Information Retrieval Series)
Hierarchical, perceptron-like learning for ontology-based information extraction

Proceedings of the 16th international conference on World Wide Web
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Nested Named Entity Recognition in Historical Archive Text

ICSC '07 Proceedings of the International Conference on Semantic Computing
The Semantic Web: Apotheosis of Annotation, but What Are Its Semantics?

IEEE Intelligent Systems
Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Semantic annotation, indexing, and retrieval

Web Semantics: Science, Services and Agents on the World Wide Web
Semantic annotation for knowledge management: Requirements and a survey of the state of the art

Web Semantics: Science, Services and Agents on the World Wide Web
A methodology towards effective and efficient manual document annotation: addressing annotator discrepancy and annotation quality

EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain.