Entity annotation based on inverse index operations

Authors:
Ganesh Ramakrishnan;Sreeram Balakrishnan;Sachindra Joshi
Affiliations:
IIT Delhi, Hauz Khas, New Delhi, India;IIT Delhi, Hauz Khas, New Delhi, India;IIT Delhi, Hauz Khas, New Delhi, India
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 5
Cited 4

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Regular Expression Indexing Engine

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Optimization issues in inverted index-based entity annotation

Proceedings of the 3rd international conference on Scalable information systems
Information Extraction

Foundations and Trends in Databases
Data-oriented content query system: searching for data into text on the web

Proceedings of the third ACM international conference on Web search and data mining
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Entity annotation involves attaching a label such as 'name' or 'organization' to a sequence of tokens in a document. All the current rule-based and machine learning-based approaches for this task operate at the document level. We present a new and generic approach to entity annotation which uses the inverse index typically created for rapid key-word based searching of a document collection. We define a set of operations on the inverse index that allows us to create annotations defined by cascading regular expressions. The entity annotations for an entire document corpus can be created purely of the index with no need to access the original documents. Experiments on two publicly available data sets show very significant performance improvements over the document-based annotators.