Tagging of name records for genealogical data browsing

Authors:
Mike Perrow;David Barber
Affiliations:
IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland
Venue:
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Year:
2006

Citing 12
Cited 4

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Automatic segmentation of text into structured records

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
prefuse: a toolkit for interactive information visualization

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On the revision of probabilistic beliefs using uncertain evidence

Artificial Intelligence
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A Network Analysis Model for Disambiguation of Names in Lists

Computational & Mathematical Organization Theory

A multilingual approach to technical manuscripts: 16th and 17th-century Portuguese shipbuilding treatises

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A simple method for citation metadata extraction using hidden markov models

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Combining proper name-coreference with conditional random fields for semi-supervised named entity recognition in Vietnamese text

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Texts, illustrations, and physical objects: the case of ancient shipbuilding treatises

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a method of parsing unstructured textual records briefly describing a person and their direct relatives, which we use in the construction of a browsing tool for genealogical data. The records have been created by researchers who are currently digitising a collection of historical archives stored at the Abbaye de Saint-Maurice, Switzerland. The string 'Beatrix, daughter of Johannes Trona, of Saillon' is a typical example of a record. We wish to annotate every term (word and symbol) in our records with a label which describes whether the term is a name (e.g. 'Beatrix'), a place (e.g. 'Saillon'), or a relationship (e.g. 'daughter'). Using this information, we are able to derive both a canonical form for each name (e.g. 'Beatrix Trona'), and the relationships between people. We build upon work developed for the cleaning and standardization of names for record linkage corpora, adding several enhancements to deal with our more difficult data, which contains common name structures of French, Italian and Latin, over hundreds of years. We present an approach to this problem that works interactively with a user to annotate the data set accurately, greatly reducing the human effort required. We do this by learning a Hidden Markov Model representing a record structure, and finding structural patterns in new records. Finally, we present a brief overview of a tool we are developing to help genealogical researchers browse and search the data.