A case for interaction: a study of interactive information retrieval behavior and effectiveness
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Improving the effectiveness of information retrieval with local context analysis
ACM Transactions on Information Systems (TOIS)
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query Expansion by Mining User Logs
IEEE Transactions on Knowledge and Data Engineering
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Identifying similarities, periodicities and bursts for online search queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Generating query substitutions
Proceedings of the 15th international conference on World Wide Web
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
ACM Transactions on Information Systems (TOIS)
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Query suggestion using hitting time
Proceedings of the 17th ACM conference on Information and knowledge management
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Gazpacho and summer rash: lexical relationships from temporal patterns of web search queries
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Continuous analytics over discontinuous streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
S4: Distributed Stream Computing Platform
ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
The effects of time on query flow graph-based models for query suggestion
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Apache hadoop goes realtime at Facebook
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Full-text indexing for optimizing selection operations in large-scale data analytics
Proceedings of the second international workshop on MapReduce and its applications
Bagging gradient-boosted trees for high precision, low variance ranking models
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Estimation methods for ranking recent information
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Detecting seasonal queries by time-series analysis
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning to Rank for Information Retrieval and Natural Language Processing
Learning to Rank for Information Retrieval and Natural Language Processing
Automatic management of partitioned, replicated search services
Proceedings of the 2nd ACM Symposium on Cloud Computing
Answering General Time-Sensitive Queries
IEEE Transactions on Knowledge and Data Engineering
Modeling and predicting behavioral dynamics on the web
Proceedings of the 21st international conference on World Wide Web
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Earlybird: Real-Time Search at Twitter
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Temporal Analytics on Big Data for Web Advertising
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Time-sensitive query auto-completion
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
The unified logging infrastructure for data analytics at Twitter
Proceedings of the VLDB Endowment
Muppet: MapReduce-style processing of fast data
Proceedings of the VLDB Endowment
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production today, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data.