InfoXtract: a customizable intermediate level information extraction engine

Authors:
Rohini K. Srihari;Wei Li;Cheng Niu;Thomas Cornell
Affiliations:
State University of New York at Buffalo;Cymfony Inc., Williamsville, NY;Cymfony Inc., Williamsville, NY;Cymfony Inc., Williamsville, NY
Venue:
SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Year:
2003

Citing 7
Cited 5

Tree languages

Handbook of formal languages, vol. 3
Trips on trees

Acta Cybernetica
Finite-State Language Processing

Finite-State Language Processing
A hybrid approach for named entity and sub-type tagging

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Location normalization for information extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
FASTUS: a system for extracting information from text

HLT '93 Proceedings of the workshop on Human Language Technology
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Unapparent information revelation: a concept chain graph approach

Proceedings of the 14th ACM international conference on Information and knowledge management
Question Answering on a case insensitive corpus

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Infoxtract: A customizable intermediate level information extraction engine

Natural Language Engineering
Sentiment Classification across Domains

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Mining semantic relationships between concepts across documents incorporating wikipedia knowledge

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction (IE) systems assist analysts to assimilate information from electronic documents. This paper focuses on IE tasks designed to support information discovery applications. Since information discovery implies examining large volumes of documents drawn from various sources for situations that cannot be anticipated a priori, they require IE systems to have breadth as well as depth. This implies the need for a domain-independent IE system that can easily be customized for specific domains: end users must be given tools to customize the system on their own. It also implies the need for defining new intermediate level IE tasks that are richer than the subject-verb-object (SVO) triples produced by shallow systems, yet not as complex as the domain-specific scenarios defined by the Message Understanding Conference (MUC). This paper describes a robust, scalable IE engine designed for such purposes. It describes new IE tasks such as entity profiles, and concept-based general events which represent realistic goals in terms of what can be accomplished in the near-term as well as providing useful, actionable information. These new tasks also facilitate the correlation of output from an IE engine with existing structured data. Benchmarking results for the core engine and applications utilizing the engine are presented.