Integrating Unstructured Data into Relational Databases

Authors:
Imran R. Mansuri;Sunita Sarawagi
Affiliations:
IIT Bombay;IIT Bombay
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 35

Efficiently linking text documents with relevant structured information

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Overview and semantic issues of text mining

ACM SIGMOD Record
Context-aware wrapping: synchronized data extraction

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Helping satisfy multiple objectives during a service desk conversation

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Keyword query cleaning

Proceedings of the VLDB Endowment
Automatic wrapper induction from hidden-web sources with domain knowledge

Proceedings of the 10th ACM workshop on Web information and data management
Mapping enterprise entities to text segments

Proceedings of the 2nd PhD workshop on Information and knowledge management
Information Extraction

Foundations and Trends in Databases
A quality-aware optimizer for information extraction

ACM Transactions on Database Systems (TODS)
Building query optimizers for information extraction: the SQoUT project

ACM SIGMOD Record
Do we mean the same?: disambiguation of extracted keyword queries for database search

Proceedings of the First International Workshop on Keyword Search on Structured Data
Creating relational data from unstructured and ungrammatical data sources

Journal of Artificial Intelligence Research
The trichotomy of HAVING queries on a probabilistic database

The VLDB Journal — The International Journal on Very Large Data Bases
Identifying comparable entities on the web

Proceedings of the 18th ACM conference on Information and knowledge management
Generalized expectation criteria for bootstrapping extractors using record-text alignment

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Efficient evaluation of HAVING queries on a probabilistic database

DBPL'07 Proceedings of the 11th international conference on Database programming languages
I4E: interactive investigation of iterative information extraction

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ONDUX: on-demand unsupervised learning for information extraction

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Unsupervised strategies for information extraction by text segmentation

Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Constructing reference sets from unstructured, ungrammatical text

Journal of Artificial Intelligence Research
A probabilistic approach for automatically filling form-based web interfaces

Proceedings of the VLDB Endowment
Collective extraction from heterogeneous web lists

Proceedings of the fourth ACM international conference on Web search and data mining
2D correlative-chain conditional random fields for semantic annotation of web objects

Journal of Computer Science and Technology
Link-based hidden attribute discovery for objects on Web

Proceedings of the 14th International Conference on Extending Database Technology
Joint unsupervised structure discovery and information extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Semi-supervised multi-task learning of structured prediction models for web information extraction

Proceedings of the 20th ACM international conference on Information and knowledge management
Enabling information extraction by inference of regular expressions from sample entities

Proceedings of the 20th ACM international conference on Information and knowledge management
Conceptual views for entity-centric search: turning data into meaningful concepts

Computer Science - Research and Development
Self-supervised learning approach for extracting citation information on the web

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Exploiting evidence from unstructured data to enhance master data management

Proceedings of the VLDB Endowment
Exploring structure and content on the web: extraction and integration of the semi-structured web

Proceedings of the sixth ACM international conference on Web search and data mining
The parallel path framework for entity discovery on the web

ACM Transactions on the Web (TWEB)
Exploiting a proximity-based positional model to improve the quality of information extraction by text segmentation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a system for automatically integrating unstructured text into a multi-relational database using state-of-the-art statistical models for structure extraction and matching. We show how to extend current highperforming models, Conditional Random Fields and their semi-markov counterparts, to effectively exploit a variety of recognition clues available in a database of entities, thereby significantly reducing the dependence on manually labeled training data. Our system is designed to load unstructured records into columns spread across multiple tables in the database while resolving the relationship of the extracted text with existing column values, and preserving the cardinality and link constraints of the database. We show how to combine the inference algorithms of statistical models with the database imposed constraints for optimal data integration.