Foundations of statistical natural language processing
Foundations of statistical natural language processing
A class of data structures for associative searching
PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs, Algorithms and Optimization
Graphs, Algorithms and Optimization
Efficient Batch Top-k Search for Dictionary-based Entity Recognition
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Effective keyword search in relational databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Introduction to Information Retrieval
Introduction to Information Retrieval
Proceedings of the VLDB Endowment
Scalable ad-hoc entity extraction from text collections
Proceedings of the VLDB Endowment
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Exploiting web search to generate synonyms for entities
Proceedings of the 18th international conference on World wide web
Efficient interactive fuzzy keyword search
Proceedings of the 18th international conference on World wide web
Efficient approximate entity extraction with edit distance constraints
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Semantic annotation for knowledge management: Requirements and a survey of the state of the art
Web Semantics: Science, Services and Agents on the World Wide Web
Keyword Search in Databases
EnBlogue: emergent topic detection in web 2.0 streams
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Tool support for technology scouting using online sources
ER'11 Proceedings of the 30th international conference on Advances in conceptual modeling: recent developments and new directions
Hi-index | 0.00 |
We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-k relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn the user preference and personalize the annotation for individual users.