Parallel database systems: the future of high performance database systems
Communications of the ACM
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
First story detection in TDT is hard
Proceedings of the ninth international conference on Information and knowledge management
A comparison of techniques to find mirrored hosts on the WWW
Journal of the American Society for Information Science
Temporal summaries of new topics
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Generic summaries for indexing in information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Combining semantic and syntactic document classifiers to improve first story detection
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Topic-conditioned novelty detection
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval and novelty detection at the sentence level
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A System for new event detection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Dynamic Composition of Information Retrieval Techniques
Journal of Intelligent Information Systems
Text classification and named entities for new event detection
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
WWW '05 Proceedings of the 14th international conference on World Wide Web
Story link detection and new event detection are asymmetric
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A probabilistic model for retrospective news event detection
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Redundant documents and search effectiveness
Proceedings of the 14th ACM international conference on Information and knowledge management
Novelty detection based on sentence level patterns
Proceedings of the 14th ACM international conference on Information and knowledge management
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Load shedding in stream databases: a control-based approach
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Hybrid global-local indexing for effcient peer-to-peer information retrieval
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bilkent news portal: a personalizable system with new event detection and tracking capabilities
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Maintaining dynamic channel profiles on the web
Proceedings of the VLDB Endowment
Real-time new event detection for video streams
Proceedings of the 17th ACM conference on Information and knowledge management
Event detection with common user interests
Proceedings of the 10th ACM workshop on Web information and data management
Online New Event Detection Based on IPLSA
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
New event detection and topic tracking in Turkish
Journal of the American Society for Information Science and Technology
Black-box performance control for high-volume non-interactive systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Detecting hot events from web search logs
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Term Weighting Schemes for Emerging Event Detection
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Identifying local events by using microblogs as social sensors
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Hi-index | 0.00 |
In a document streaming environment, online detection of the first documents that mention previously unseen events is an open challenge. For this online new event detection (ONED) task, existing studies usually assume that enough resources are always available and focus entirely on detection accuracy without considering efficiency. Moreover, none of the existing work addresses the issue of providing an effective and friendly user interface. As a result, there is a significant gap between the existing systems and a system that can be used in practice. In this paper, we propose an ONED framework with the following prominent features. First, a combination of indexing and compression methods is used to improve the document processing rate by orders of magnitude without sacrificing much detection accuracy. Second, when resources are tight, a resource-adaptive computation method is used to maximize the benefit that can be gained from the limited resources. Third, when the new event arrival rate is beyond the processing capability of the consumer of the ONED system, new events are further filtered and prioritized before they are presented to the consumer. Fourth, implicit citation relationships are created among all the documents and used to compute the importance of document sources. This importance information can guide the selection of document sources. We implemented a prototype of our framework on top of IBM's Stream Processing Core middleware. We also evaluated the effectiveness of our techniques on the standard TDT5 benchmark. To the best of our knowledge, this is the first implementation of a real application in a large-scale stream processing system.