Towards benchmarking stream data warehouses
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Stat!: an interactive analytics environment for big data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Execution and optimization of continuous queries with cyclops
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast data in the era of big data: Twitter's real-time related query suggestion architecture
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Continuous query processing with concurrency control: reading updatable resources consistently
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Data stream processing with concurrency control
ACM SIGAPP Applied Computing Review
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
"Big Data" in map-reduce (M-R) clusters is often fundamentally temporal in nature, as are many analytics tasks over such data. For instance, display advertising uses Behavioral Targeting (BT) to select ads for users based on prior searches, page views, etc. Previous work on BT has focused on techniques that scale well for offline data using M-R. However, this approach has limitations for BT-style applications that deal with temporal data: (1) many queries are temporal and not easily expressible in M-R, and moreover, the set-oriented nature of M-R front-ends such as SCOPE is not suitable for temporal processing, (2) as commercial systems mature, they may need to also directly analyze and react to real-time data feeds since a high turnaround time can result in missed opportunities, but it is difficult for current solutions to naturally also operate over real-time streams. Our contributions are twofold. First, we propose a novel framework called TiMR (pronounced timer), that combines a time-oriented data processing system with a M-R framework. Users write and submit analysis algorithms as temporal queries - these queries are succinct, scale-out-agnostic, and easy to write. They scale well on large-scale offline data using TiMR, and can work unmodified over real-time streams. We also propose new cost-based query fragmentation and temporal partitioning schemes for improving efficiency with TiMR. Second, we show the feasibility of this approach for BT, with new temporal algorithms that exploit new targeting opportunities. Experiments using real data from a commercial ad platform show that TiMR is very efficient and incurs orders-of-magnitude lower development effort. Our BT solution is easy and succinct, and performs up to several times better than current schemes in terms of memory, learning time, and click-through-rate/coverage.