Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
prefuse: a toolkit for interactive information visualization
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On the revision of probabilistic beliefs using uncertain evidence
Artificial Intelligence
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A Network Analysis Model for Disambiguation of Names in Lists
Computational & Mathematical Organization Theory
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A simple method for citation metadata extraction using hidden markov models
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Texts, illustrations, and physical objects: the case of ancient shipbuilding treatises
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
In this paper we present a method of parsing unstructured textual records briefly describing a person and their direct relatives, which we use in the construction of a browsing tool for genealogical data. The records have been created by researchers who are currently digitising a collection of historical archives stored at the Abbaye de Saint-Maurice, Switzerland. The string 'Beatrix, daughter of Johannes Trona, of Saillon' is a typical example of a record. We wish to annotate every term (word and symbol) in our records with a label which describes whether the term is a name (e.g. 'Beatrix'), a place (e.g. 'Saillon'), or a relationship (e.g. 'daughter'). Using this information, we are able to derive both a canonical form for each name (e.g. 'Beatrix Trona'), and the relationships between people. We build upon work developed for the cleaning and standardization of names for record linkage corpora, adding several enhancements to deal with our more difficult data, which contains common name structures of French, Italian and Latin, over hundreds of years. We present an approach to this problem that works interactively with a user to annotate the data set accurately, greatly reducing the human effort required. We do this by learning a Hidden Markov Model representing a record structure, and finding structural patterns in new records. Finally, we present a brief overview of a tool we are developing to help genealogical researchers browse and search the data.