From the old to the new: intergrating hypertext into traditional scholarship
HYPERTEXT '87 Proceedings of the ACM conference on Hypertext
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Disambiguating Geographic Names in a Historical Digital Library
ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Humanities Computing
Linear work suffix array construction
Journal of the ACM (JACM)
Biomedical named entity recognition using conditional random fields and rich feature sets
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Computational historiography: Data mining in a century of classics journals
Journal on Computing and Cultural Heritage (JOCCH)
Named entity identification and cyberinfrastructure
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Citations and annotations in classics: old problems and new perspectives
Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment: metadata, vocabularies and techniques in the Digital Humanities
Hi-index | 0.00 |
This paper describes the creation of an annotated corpus supporting the task of extracting information---particularly canonical citations, that are references to the ancient sources---from Classics-related texts. The corpus is multilingual and contains approximately 30,000 tokens of POS-tagged, cleanly transcribed text drawn from the L'Année Philologique. In the corpus the named entities that are needed to capture such citations were annotated by using an annotation scheme devised specifically for this task. The contribution of the paper is two-fold: firstly, it describes how the corpus was created using Active Annotation, an approach which combines automatic and manual annotation to optimize the human resources required to create any corpus. Secondly, the performances of an NER classifier, based on Conditional Random Fields, are evaluated using the created corpus as training and test set: the results obtained by using three different feature sets are compared and discussed.