Approximate counting of inversions in a data stream

Authors:
Miklós Ajtai;T. S. Jayram;Ravi Kumar;D. Sivakumar
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Year:
2002

Citing 18
Cited 19

Approximate counting: a detailed analysis

BIT - Ellis Horwood series in artificial intelligence
A fast and simple randomized parallel algorithm for the maximal independent set problem

Journal of Algorithms
The probabilistic communication complexity of set intersection

SIAM Journal on Discrete Mathematics
A survey of adaptive sorting algorithms

ACM Computing Surveys (CSUR)
Exploiting few inversions when sorting: sequential and parallel algorithms

Theoretical Computer Science
Communication complexity

Communication complexity
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Approximate indexed lists

Journal of Algorithms - Special issue on SODA '95 papers
Sorting Permutations by Reversals and Eulerian Cycle Decompositions

SIAM Journal on Discrete Mathematics
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Computing on data streams

External memory algorithms
Spot-checkers

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Counting large numbers of events in small registers

Communications of the ACM
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Optimal Algorithms for List Indexing and Subset Rank

WADS '89 Proceedings of the Workshop on Algorithms and Data Structures
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science

Pass efficient algorithms for approximating large matrices

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Counting inversions in lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

ACM Transactions on Database Systems (TODS)
An information statistics approach to data stream and communication complexity

Journal of Computer and System Sciences - Special issue on FOCS 2002
Range-Efficient Computation of F" over Massive Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Continuously maintaining order statistics over data streams: extended abstract

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Estimating the sortedness of a data stream

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
On distance to monotonicity and longest increasing subsequence of a data stream

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Overcoming the l1 non-embeddability barrier: algorithms for product metrics

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Partition Arguments in Multiparty Communication Complexity

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Aggregate computation over data streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Near-optimal sublinear time algorithms for Ulam distance

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Counting inversions, offline orthogonal range counting, and related problems

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Partition arguments in multiparty communication complexity

Theoretical Computer Science
Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subsequence

SIAM Journal on Computing
Finding longest increasing and common subsequences in streaming data

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Edit distance to monotonicity in sliding windows

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

(MATH) Inversions are used as a fundamental quantity to measure the sortedness of data, to evaluate different ranking methods for databases, and in the context of rank aggregation. Considering the volume of the data sets in these applications, the data stream model {14, 2] is a natural setting to design efficient algorithms.We obtain a suite of space-efficient streaming algorithms for approximating the number of inversions in a permutation. The best space bound we achieve is $O(\log n \log \log n)$ through a deterministic algorithm. In contrast, we derive an $\Omega(n)$ lower bound for randomized exact computation for this problem; thus approximation is essential.(MATH) We also consider two generalizations of this problem: (1) approximating the number of inversions between two permutations, for which we obtain a randomized $O(\sqrt{n} \log n)$-space algorithm, and (2) approximating the number of inversions in a general list, for which we obtain a randomized $O(\sqrt{n} \log^2 n)$-space two-pass algorithm. In contrast, we derive $\Omega(n)$-space lower bounds for deterministic approximate computation for these problems; thus both randomization and approximation are essential.All our algorithms use only O(log n) time per data item.