e-approximations with minimum packing constraint violation (extended abstract)
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Approximation algorithms for geometric median problems
Information Processing Letters
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximation schemes for Euclidean k-medians and related problems
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A constant-factor approximation algorithm for the k-median problem (extended abstract)
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Analysis of a local search heuristic for facility location problems
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
K-medians, facility location, and the Chernoff-Wald bound
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Sublinear time approximate clustering
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Algorithms for facility location problems with outliers
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Clustering to minimize the sum of cluster diameters
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Local search heuristic for k-median and facility location problems
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Approximate counting of inversions in a data stream
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A Sublinear Time Approximation Scheme for Clustering in Metric Spaces
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Optimal time bounds for approximate clustering
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management
ACM SIGMOD Record
Algorithms column: sublinear time algorithms
ACM SIGACT News
Cost-efficient mining techniques for data streams
ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Efficient estimation algorithms for neighborhood variance and other moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On coresets for k-means and k-median clustering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Algorithms for dynamic geometric problems over data streams
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A New Conceptual Clustering Framework
Machine Learning
Efficient algorithms for constructing (1+,ε, β)-spanners in the distributed and streaming models
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Coresets in dynamic geometric data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
ACM SIGMOD Record
The space complexity of pass-efficient algorithms for clustering
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On k-Median clustering in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Incremental algorithms for facility location and k-Median
Theoretical Computer Science - Approximation and online algorithms
Tolerant property testing and distance approximation
Journal of Computer and System Sciences
Can exclusive clustering on streaming data be achieved?
ACM SIGKDD Explorations Newsletter
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Continuous subspace clustering in streaming time series
Information Systems
Declaring independence via the sketching of sketches
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
A semi-random multiple decision-tree algorithm for mining data streams
Journal of Computer Science and Technology
Summarizing spatial data streams using ClusterHulls
Journal of Experimental Algorithmics (JEA)
Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity
APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Continuous Trend-Based Clustering in Data Streams
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Tight results for clustering and summarizing data streams
Proceedings of the 12th International Conference on Database Theory
Online pairing of VoIP conversations
The VLDB Journal — The International Journal on Very Large Data Bases
Density-based clustering of data streams at multiple resolutions
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Efficient Clustering of Web-Derived Data Sets
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Harnessing the strengths of anytime algorithms for constant data streams
Data Mining and Knowledge Discovery
Adaptive Sampling for k-Means Clustering
APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Incremental and Adaptive Clustering Stream Data over Sliding Window
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Cluster-Swap: A Distributed K-median Algorithm for Sensor Networks
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Communication-Efficient Privacy-Preserving Clustering
Transactions on Data Privacy
Small space representations for metric min-sum k-clustering and their applications
STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
On the complexity of approximation streaming algorithms for the k-center problem
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Data compression by volume prototypes for streaming data
Pattern Recognition
MG-join: detecting phenomena and their correlation in high dimensional data streams
Distributed and Parallel Databases
Fast modified global k-means algorithm for incremental cluster construction
Pattern Recognition
Increasing availability of industrial systems through data stream mining
Computers and Industrial Engineering
Online and incremental algorithms for facility location
ACM SIGACT News
Property testing
Efficient decision tree re-alignment for clustering time-changing data streams
From active data management to event-based systems and more
Property testing
Memoryless facility location in one pass
ACM Transactions on Algorithms (TALG)
A clustering algorithm for multiple data streams based on spectral component similarity
Information Sciences: an International Journal
Memoryless facility location in one pass
STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Streaming k-means on well-clusterable data
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Adaptive spatial partitioning for multidimensional data streams
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Streaming algorithms for geometric problems
FSTTCS'04 Proceedings of the 24th international conference on Foundations of Software Technology and Theoretical Computer Science
A randomized algorithm for online unit clustering
WAOA'06 Proceedings of the 4th international conference on Approximation and Online Algorithms
A scalable architecture for maintaining packet latency measurements
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
MAPLE: a scalable architecture for maintaining packet latency measurements
Proceedings of the 2012 ACM conference on Internet measurement conference
Streaming algorithms for data in motion
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Data stability in clustering: a closer look
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
A single pass algorithm for clustering evolving data streams based on swarm intelligence
Data Mining and Knowledge Discovery
Deterministic sublinear-time approximations for metric 1-median selection
Information Processing Letters
Real time processing of data from patient biodevices
HIKM '11 Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management - Volume 120
Data stream clustering: A survey
ACM Computing Surveys (CSUR)
Scalable K-Means by ranked retrieval
Proceedings of the 7th ACM international conference on Web search and data mining
On clustering large number of data streams
Intelligent Data Analysis
Hi-index | 0.00 |
We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k--Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n). This is a significant improvement of the previous best algorithm which yielded a 2O(1/ε) approximation using O(nε) space. Next we give a streaming algorithm for the k--Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.