One platform for mining structured and unstructured data: dream or reality?
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
EntityRank: searching entities directly and holistically
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the provenance of non-answers to queries over extracted data
Proceedings of the VLDB Endowment
High-performance information extraction with AliBaba
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
SystemT: a system for declarative information extraction
ACM SIGMOD Record
Purple SOX extraction management system
ACM SIGMOD Record
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimizing complex extraction programs over evolving text data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Digital weight watching: reconstruction of scanned documents
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
RankIE: document retrieval on ranked entity graphs
Proceedings of the VLDB Endowment
Enterprise information extraction: recent developments and open challenges
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Focused retrieval and result aggregation with political data
Information Retrieval
Querying probabilistic information extraction
Proceedings of the VLDB Endowment
Attribute domain discovery for hidden web databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SystemT: a declarative information extraction system
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Beauty and the beast: the theory and practice of information integration
ICDT'07 Proceedings of the 11th international conference on Database Theory
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Information extraction as a filtering task
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Towards generic framework for tabular data extraction and management in documents
Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
Hi-index | 0.00 |
This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text). We first survey research on information extraction in the database, AI, NLP, IR, and Web communities in recent years. Then we discuss why this is the right time for the database community to actively participate and address the problem of managing information extraction (including in particular the challenges of maintaining and querying the extracted information, and accounting for the imprecision and uncertainty inherent in the extraction process). Finally, we show how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools. We do not assume prior knowledge of text management, NLP, extraction techniques, or machine learning.