Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multistrategy Learning for Information Extraction
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Information extraction challenges in managing unstructured data
ACM SIGMOD Record
Uncertainty management in rule-based information extraction systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Understanding queries in a search database system
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Midas: integrating public financial data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Enterprise information extraction: recent developments and open challenges
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Is formalizing events necessary for full exploitation
ESAIR '10 Proceedings of the third workshop on Exploiting semantic annotations in information retrieval
Automatic rule refinement for information extraction
Proceedings of the VLDB Endowment
Enterprise data classification using semantic web technologies
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
Rewrite rules for search database systems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The SystemT IDE: an integrated development environment for information extraction rules
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SystemT: a declarative information extraction system
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Querying versioned software repositories
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
A probability model for related entity retrieval using relation pattern
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
Building user-defined runtime adaptation routines for stream processing applications
Proceedings of the VLDB Endowment
Spanners: a formal framework for information extraction
Proceedings of the 32nd symposium on Principles of database systems
Provenance-based dictionary refinement in information extraction
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A CRM system for social media: challenges and experiences
Proceedings of the 22nd international conference on World Wide Web
INDREX: in-database distributional relation extraction
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
When speed has a price: fast information extraction using approximate algorithms
Proceedings of the VLDB Endowment
PREDOSE: A semantic web platform for drug abuse epidemiology using social media
Journal of Biomedical Informatics
Understanding system design for big data workloads
IBM Journal of Research and Development
A platform for eXtreme analytics
IBM Journal of Research and Development
Hi-index | 0.00 |
As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) -- the discipline concerned with extracting structured information from unstructured text. Classical IE techniques developed by the NLP community were based on cascading grammars and regular expressions. However, due to the inherent limitations of grammarbased extraction, these techniques are unable to: (i) scale to large data sets, and (ii) support the expressivity requirements of complex information tasks. At the IBM Almaden Research Center, we are developing SystemT, an IE system that addresses these limitations by adopting an algebraic approach. By leveraging well-understood database concepts such as declarative queries and costbased optimization, SystemT enables scalable execution of complex information extraction tasks. In this paper, we motivate the SystemT approach to information extraction. We describe our extraction algebra and demonstrate the effectiveness of our optimization techniques in providing orders of magnitude reduction in the running time of complex extraction tasks.