Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Record Location and Reconfiguration in Unstructured Multiple-Record Web Documents
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Kernel methods for relation extraction
The Journal of Machine Learning Research
A novel use of statistical parsing to extract information from text
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Machine Learning
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Dependency tree kernels for relation extraction
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Simple algorithms for complex relation extraction with applications to biomedical IE
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Interactive information extraction with constrained conditional random fields
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Practical Markov logic containing first-order quantifiers with application to identity uncertainty
CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
BLOG: probabilistic models with unknown objects
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Canonicalization of database records using adaptive similarity measures
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Foundations and Trends in Databases
Structural, transitive and latent models for biographic fact extraction
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Combining multiple sources of evidence in web information extraction
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
SCAD: collective discovery of attribute values
Proceedings of the 20th international conference on World wide web
Exploiting evidence from unstructured data to enhance master data management
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Named-entity recognition systems extract entities such as people, organizations, and locations from unstructured text. Rather than extract these mentions in isolation, this paper presents a record extraction system that assembles mentions into records (i.e. database tuples). We construct a probabilistic model of the compatibility between field values, then employ graph partitioning algorithms to cluster fields into cohesive records. We also investigate compatibility functions over sets of fields, rather than simply pairs of fields, to examine how higher representational power can impact performance. We apply our techniques to the task of extracting contact records from faculty and student homepages, demonstrating a 53% error reduction over baseline approaches.