Handbook of formal languages, vol. 3
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Acta Cybernetica
Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Finite-State Language Processing
Finite-State Language Processing
MiTAP: A Case Study of Integrated Knowledge Discovery Tools
HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 3 - Volume 3
REES: a large-scale relation and event extraction system
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A question answering system supported by information extraction
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A hybrid approach for named entity and sub-type tagging
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Location normalization for information extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A bootstrapping approach to named entity classification using successive learners
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An expert lexicon approach to identifying English phrasal verbs
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
FASTUS: a system for extracting information from text
HLT '93 Proceedings of the workshop on Human Language Technology
InfoXtract: a customizable intermediate level information extraction engine
SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
From manual knowledge engineering to bootstrapping: Progress in information extraction and NLP
ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Automatically generating extraction patterns from untagged text
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Semantic search via XML fragments: a high-precision approach to IR
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A Method for Estimating the Precision of Placename Matching
IEEE Transactions on Knowledge and Data Engineering
Use of ranked cross document evidence trails for hypothesis generation
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic scoring of short handwritten essays in reading comprehension tests
Artificial Intelligence
Ontology-supported polarity mining
Journal of the American Society for Information Science and Technology
Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents
ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
NE tagging for Urdu based on bootstrap POS learning
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Making semantic topicality robust through term abstraction
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
An Information-Extraction System for Urdu---A Resource-Poor Language
ACM Transactions on Asian Language Information Processing (TALIP)
Using sequence kernels to identify opinion entities in Urdu
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Print pickets combined language models and knowledge resources in web
ROCLING '11 ROCLING 2011 Poster Papers
Improving cross-document knowledge discovery using explicit semantic analysis
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
Information Extraction (IE) systems assist analysts to assimilate information from electronic documents. This paper focuses on IE tasks designed to support information discovery applications. Since information discovery implies examining large volumes of heterogeneous documents for situations that cannot be anticipated a priori, they require IE systems to have breadth as well as depth. This implies the need for a domain-independent IE system that can easily be customized for specific domains: end users must be given tools to customize the system on their own. It also implies the need for defining new intermediate level IE tasks that are richer than the subject-verb-object (SVO) triples produced by shallow systems, yet not as complex as the domain-specific scenarios defined by the Message Understanding Conference (MUC). This paper describes InfoXtract, a robust, scalable, intermediate-level IE engine that can be ported to various domains. It describes new IE tasks such as synthesis of entity profiles, and extraction of concept-based general events which represent realistic near-term goals focused on deriving useful, actionable information. Entity profiles consolidate information about a person/organization/location etc. within a document and across documents into a single template; this takes into account aliases and anaphoric references as well as key relationships and events pertaining to that entity. Concept-based events attempt to normalize information such as time expressions (e.g., yesterday) as well as ambiguous location references (e.g., Buffalo). These new tasks facilitate the correlation of output from an IE engine with structured data to enable text mining. InfoXtract's hybrid architecture comprised of grammatical processing and machine learning is described in detail. Benchmarking results for the core engine and applications utilizing the engine are presented.