Efficiently linking text documents with relevant structured information
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Randomized algorithms for data reconciliation in wide area aggregate query processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An efficient filter for approximate membership checking
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Entity categorization over large document collections
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On the provenance of non-answers to queries over extracted data
Proceedings of the VLDB Endowment
Scalable ad-hoc entity extraction from text collections
Proceedings of the VLDB Endowment
Efficient techniques for document sanitization
Proceedings of the 17th ACM conference on Information and knowledge management
Foundations and Trends in Databases
Exploiting web search to generate synonyms for entities
Proceedings of the 18th international conference on World wide web
Efficient approximate entity extraction with edit distance constraints
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Journal of Artificial Intelligence Research
Efficient algorithms for approximate member extraction using signature-based inverted lists
Proceedings of the 18th ACM conference on Information and knowledge management
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Mining document collections to facilitate accurate approximate entity matching
Proceedings of the VLDB Endowment
Graph-based concept identification and disambiguation for enterprise search
Proceedings of the 19th international conference on World wide web
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Online annotation of text streams with structured entities
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Simple and efficient algorithm for approximate dictionary matching
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A metadata geoparsing system for place name recognition and resolution in metadata records
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Compressed data structures for annotated web search
Proceedings of the 21st international conference on World Wide Web
Extending enterprise service design knowledge using clustering
ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Sponsored search ad selection by keyword structure analysis
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
PartSS: an efficient partition-based filtering for edit distance constraints
ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Graph-based reference table construction to facilitate entity matching
Journal of Systems and Software
Efficient parsing-based search over structured data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
We consider the problem of speeding up Entity Recognition systems that exploit existing large databases of structured entities to improve extraction accuracy. These systems require the computation of the maximum similarity scores of several overlapping segments of the input text with the entity database. We formulate a Batch-Top-K problem with the goal of sharing computations across overlapping segments. Our proposed algorithm performs a factor of three faster than independent Top-K queries and only a factor of two slower than an unachievable lower bound on total cost. We then propose a novel modification of the popular Viterbi algorithm for recognizing entities so as to work with easily computable bounds on match scores, thereby reducing the total inference time by a factor of eight compared to stateof- the-art methods.