The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Fast Algorithms for Mining Emerging Patterns
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
Universal classes of hash functions (Extended Abstract)
STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
The Journal of Machine Learning Research
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Proceedings of the 15th international conference on World Wide Web
Exploring social annotations for the semantic web
Proceedings of the 15th international conference on World Wide Web
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ICML '06 Proceedings of the 23rd international conference on Machine learning
Summarizing email conversations with clue words
Proceedings of the 16th international conference on World Wide Web
The complex dynamics of collaborative tagging
Proceedings of the 16th international conference on World Wide Web
Tag clouds for summarizing web search results
Proceedings of the 16th international conference on World Wide Web
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A Novelty-based Clustering Method for On-line Documents
World Wide Web
Deciphering mobile search patterns: a study of Yahoo! mobile search queries
Proceedings of the 17th international conference on World Wide Web
Finding frequent items in data streams
Proceedings of the VLDB Endowment
A New Method to Find Top K Items in Data Streams at Arbitrary Time Granularities
CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 04
Enhancing diversity, coverage and balance for summarization through structure learning
Proceedings of the 18th international conference on World wide web
Proceedings of the 18th international conference on World wide web
Proceedings of the 18th international conference on World wide web
Tagommenders: connecting users to items through tags
Proceedings of the 18th international conference on World wide web
Tag-oriented document summarization
Proceedings of the 18th international conference on World wide web
An Operable Email Based Intelligent Personal Assistant
World Wide Web
Lower bounds on frequency estimation of data streams
CSR'08 Proceedings of the 3rd international conference on Computer science: theory and applications
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Hi-index | 0.00 |
How can we maintain a dynamic profile capturing a user's reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in North America? What are the keywords (or tags) that are 1,000 times more frequent in the blog stream on computer games than in the blog stream on Hollywood movies? To answer such interesting questions, we need to find discriminative items in multiple data streams. Each data source, such as Web search queries in a region and blog postings on a topic, can be modeled as a data stream due to the fast growing volume of the source. Motivated by the extensive applications, in this paper, we study the problem of mining discriminative items in multiple data streams. We show that, to exactly find all discriminative items in stream S 1 against stream S 2 by one scan, the space lower bound is $\Omega(|\Sigma| \log \frac{n_1}{|\Sigma|})$ , where Σ is the alphabet of items and n 1 is the current size of S 1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and sub-linear processing time per item with respect to |Σ|. The complexity of all algorithms are independent to the size of the two streams. An extensive empirical study using both real data sets and synthetic data sets verifies our design.