Efficiently linking text documents with relevant structured information
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Overview and semantic issues of text mining
ACM SIGMOD Record
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Helping satisfy multiple objectives during a service desk conversation
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Automatic wrapper induction from hidden-web sources with domain knowledge
Proceedings of the 10th ACM workshop on Web information and data management
Mapping enterprise entities to text segments
Proceedings of the 2nd PhD workshop on Information and knowledge management
Foundations and Trends in Databases
A quality-aware optimizer for information extraction
ACM Transactions on Database Systems (TODS)
Building query optimizers for information extraction: the SQoUT project
ACM SIGMOD Record
Do we mean the same?: disambiguation of extracted keyword queries for database search
Proceedings of the First International Workshop on Keyword Search on Structured Data
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
The trichotomy of HAVING queries on a probabilistic database
The VLDB Journal — The International Journal on Very Large Data Bases
Identifying comparable entities on the web
Proceedings of the 18th ACM conference on Information and knowledge management
Generalized expectation criteria for bootstrapping extractors using record-text alignment
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Efficient evaluation of HAVING queries on a probabilistic database
DBPL'07 Proceedings of the 11th international conference on Database programming languages
I4E: interactive investigation of iterative information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ONDUX: on-demand unsupervised learning for information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Unsupervised strategies for information extraction by text segmentation
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Constructing reference sets from unstructured, ungrammatical text
Journal of Artificial Intelligence Research
A probabilistic approach for automatically filling form-based web interfaces
Proceedings of the VLDB Endowment
Collective extraction from heterogeneous web lists
Proceedings of the fourth ACM international conference on Web search and data mining
2D correlative-chain conditional random fields for semantic annotation of web objects
Journal of Computer Science and Technology
Link-based hidden attribute discovery for objects on Web
Proceedings of the 14th International Conference on Extending Database Technology
Joint unsupervised structure discovery and information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Semi-supervised multi-task learning of structured prediction models for web information extraction
Proceedings of the 20th ACM international conference on Information and knowledge management
Enabling information extraction by inference of regular expressions from sample entities
Proceedings of the 20th ACM international conference on Information and knowledge management
Conceptual views for entity-centric search: turning data into meaningful concepts
Computer Science - Research and Development
Self-supervised learning approach for extracting citation information on the web
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Exploiting evidence from unstructured data to enhance master data management
Proceedings of the VLDB Endowment
Exploring structure and content on the web: extraction and integration of the semi-structured web
Proceedings of the sixth ACM international conference on Web search and data mining
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Hi-index | 0.00 |
In this paper we present a system for automatically integrating unstructured text into a multi-relational database using state-of-the-art statistical models for structure extraction and matching. We show how to extend current highperforming models, Conditional Random Fields and their semi-markov counterparts, to effectively exploit a variety of recognition clues available in a database of entities, thereby significantly reducing the dependence on manually labeled training data. Our system is designed to load unstructured records into columns spread across multiple tables in the database while resolving the relationship of the extracted text with existing column values, and preserving the cardinality and link constraints of the database. We show how to combine the inference algorithms of statistical models with the database imposed constraints for optimal data integration.