Efficient Batch Top-k Search for Dictionary-based Entity Recognition

Authors:
Amit Chandel;P. C. Nagesh;Sunita Sarawagi
Affiliations:
IIT Bombay;IIT Bombay;IIT Bombay
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 28

Efficiently linking text documents with relevant structured information

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Randomized algorithms for data reconciliation in wide area aggregate query processing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An efficient filter for approximate membership checking

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Entity categorization over large document collections

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On the provenance of non-answers to queries over extracted data

Proceedings of the VLDB Endowment
Scalable ad-hoc entity extraction from text collections

Proceedings of the VLDB Endowment
Efficient techniques for document sanitization

Proceedings of the 17th ACM conference on Information and knowledge management
Information Extraction

Foundations and Trends in Databases
Exploiting web search to generate synonyms for entities

Proceedings of the 18th international conference on World wide web
Efficient approximate entity extraction with edit distance constraints

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Query-time entity resolution

Journal of Artificial Intelligence Research
Efficient algorithms for approximate member extraction using signature-based inverted lists

Proceedings of the 18th ACM conference on Information and knowledge management
Answering table augmentation queries from unstructured lists on the web

Proceedings of the VLDB Endowment
Mining document collections to facilitate accurate approximate entity matching

Proceedings of the VLDB Endowment
Graph-based concept identification and disambiguation for enterprise search

Proceedings of the 19th international conference on World wide web
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Online annotation of text streams with structured entities

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Simple and efficient algorithm for approximate dictionary matching

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A metadata geoparsing system for place name recognition and resolution in metadata records

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Compressed data structures for annotated web search

Proceedings of the 21st international conference on World Wide Web
Extending enterprise service design knowledge using clustering

ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Sponsored search ad selection by keyword structure analysis

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
PartSS: an efficient partition-based filtering for edit distance constraints

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Graph-based reference table construction to facilitate entity matching

Journal of Systems and Software
Efficient parsing-based search over structured data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of speeding up Entity Recognition systems that exploit existing large databases of structured entities to improve extraction accuracy. These systems require the computation of the maximum similarity scores of several overlapping segments of the input text with the entity database. We formulate a Batch-Top-K problem with the goal of sharing computations across overlapping segments. Our proposed algorithm performs a factor of three faster than independent Top-K queries and only a factor of two slower than an unachievable lower bound on total cost. We then propose a novel modification of the popular Viterbi algorithm for recognizing entities so as to work with easily computable bounds on match scores, thereby reducing the total inference time by a factor of eight compared to stateof- the-art methods.