C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Multiscale conditional random fields for image labeling
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Hi-index | 0.00 |
The pervasive nature of the internet has caused a significant transformation in the field of genealogical research. This has impacted not only how research is conducted, but has also dramatically increased the number of people discovering their family history. Recent market research (Maritz Marketing 2000, Harris Interactive 2009) indicates that general interest in the United States has increased from 45% in 1996, to 60% in 2000, and 87% in 2009. Increased popularity has caused a dramatic need for improvements in algorithms related to extracting, accessing, and processing genealogical data for use in building family trees. This paper presents one approach to algorithmic improvement in the family history domain, where we infer the familial relationships of households found in human transcribed United States census data. By applying advances made in natural language processing, exploiting the sequential nature of the census, and using state of the art machine learning algorithms, we were able to decrease the error by 35% over a hand coded baseline system. The resulting system is immediately applicable to hundreds of millions of other genealogical records where families are represented, but the familial relationships are missing.