Automatically constructing a dictionary for information extraction tasks

Authors:
Ellen Riloff
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA
Venue:
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Year:
1993

Citing 9
Cited 23

An evaluation of text analysis technologies

AI Magazine
Explanation-Based Generalization: A Unifying View

Machine Learning
Explanation-Based Learning: An Alternative View

Machine Learning
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Towards a self-extending parser

ACL '79 Proceedings of the 17th annual meeting on Association for Computational Linguistics
University of Massachusetts: MUC-4 test results and analysis

MUC4 '92 Proceedings of the 4th conference on Message understanding
University of Massachusetts: description of the CIRCUS system as used for MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
A program that figures out meanings of words from context

IJCAI'77 Proceedings of the 5th international joint conference on Artificial intelligence - Volume 1

Knight-Ridder information's value adding name finder: a variation on the theme of FASTUS

MUC6 '95 Proceedings of the 6th conference on Message understanding
A Method for Integration across Text Corpus and WordNet-Based Ontologies

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatically Constructing Dictionaries for Extracting Meaningful Crime Information from Arabic Text

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Paraphrase alignment for synonym evidence discovery

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Extracting and ranking product features in opinion documents

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Methodological Review: Natural Language Processing methods and systems for biomedical ontology learning

Journal of Biomedical Informatics
Peeling back the layers: detecting event role fillers in secondary contexts

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Insights from network structure for text mining

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The role of information extraction in the design of a document triage application for biocuration

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Online news event extraction for global crisis surveillance

Transactions on computational collective intelligence V
Exploring the corporate ecosystem with a semi-supervised entity graph

Proceedings of the 20th ACM international conference on Information and knowledge management
A text-based decision support system for financial sequence prediction

Decision Support Systems
Event-Driven document selection for terrorism information extraction

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Chapter 6: web data extraction for service creation

Search Computing
Extraction of procedural knowledge from the web: a comparison of two workflow extraction approaches

Proceedings of the 21st international conference companion on World Wide Web
Understanding script-based stories using commonsense reasoning

Cognitive Systems Research
Bootstrapped training of event extraction classifiers

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Towards efficient named-entity rule induction for customizability

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A prototype tool set to support machine-assisted annotation

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Learning to predict from textual data

Journal of Artificial Intelligence Research
Spanners: a formal framework for information extraction

Proceedings of the 32nd symposium on Principles of database systems
Crime profiling for the Arabic language using computational linguistic techniques

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based NLP systems impractical for real-world applications because they cannot be easily scaled up or ported to new domains. In response to this problem, we developed a system called AutoSlog that automatically builds a domain-specific dictionary of concepts for extracting information from text. Using AutoSlog, we constructed a dictionary for the domain of terrorist event descriptions in only 5 person-hours. We then compared the AutoSlog dictionary with a hand-crafted dictionary that was built by two highly skilled graduate students and required approximately 1500 person-hours of effort. We evaluated the two dictionaries using two blind test sets of 100 texts each. Overall, the AutoSlog dictionary achieved 98% of the performance of the hand-crafted dictionary. On the first test set, the AutoSlog dictionary obtained 96.3% of the performance of the hand-crafted dictionary. On the second test set, the overall scores were virtually indistinguishable with the AutoSlog dictionary achieving 99.7% of the performance of the handcrafted dictionary.