Online annotation of text streams with structured entities

Authors:
Ken Q. Pu;Oktie Hassanzadeh;Richard Drake;Renée J. Miller
Affiliations:
UOIT, Oshawa, ON, Canada;University of Toronto, Toronto, ON, Canada;UOIT, Oshawa, ON, Canada;University of Toronto, Toronto, ON, Canada
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 16
Cited 2

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A class of data structures for associative searching

PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs, Algorithms and Optimization

Graphs, Algorithms and Optimization
Efficient Batch Top-k Search for Dictionary-based Entity Recognition

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Introduction to Information Retrieval

Introduction to Information Retrieval
Keyword query cleaning

Proceedings of the VLDB Endowment
Scalable ad-hoc entity extraction from text collections

Proceedings of the VLDB Endowment
Query by document

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Exploiting web search to generate synonyms for entities

Proceedings of the 18th international conference on World wide web
Efficient interactive fuzzy keyword search

Proceedings of the 18th international conference on World wide web
Efficient approximate entity extraction with edit distance constraints

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Semantic annotation for knowledge management: Requirements and a survey of the state of the art

Web Semantics: Science, Services and Agents on the World Wide Web
Keyword Search in Databases

Keyword Search in Databases

EnBlogue: emergent topic detection in web 2.0 streams

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Tool support for technology scouting using online sources

ER'11 Proceedings of the 30th international conference on Advances in conceptual modeling: recent developments and new directions

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-k relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn the user preference and personalize the annotation for individual users.