Similarity in languages and programs

Authors:
Cewei Cui;Zhe Dang;Thomas R. Fischer;Oscar H. Ibarra
Affiliations:
School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA;School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA;School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA;Department of Computer Science, University of California, Santa Barbara, CA 93106, USA
Venue:
Theoretical Computer Science
Year:
2013

Citing 24
Cited 0

A theory of timed automata

Theoretical Computer Science
New Decidability Results Concerning Two-Way Counter Machines

SIAM Journal on Computing
Pattern matching for clone and concept detection

Reverse engineering
On Context-Free Languages

Journal of the ACM (JACM)
Stack automata and compiling

Journal of the ACM (JACM)
Reversal-Bounded Multicounter Machines and Their Decision Problems

Journal of the ACM (JACM)
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Membrane Computing: An Introduction

Membrane Computing: An Introduction
Binary Reachability Analysis of Discrete Pushdown Timed Automata

CAV '00 Proceedings of the 12th International Conference on Computer Aided Verification
Pushdown timed automata: a binary reachability characterization and safety verification

Theoretical Computer Science
Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Deducing similarities in Java sources from bytecodes

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
On the equivalence and containment problems for unambiguous regular expressions, grammars, and automata

SFCS '81 Proceedings of the 22nd Annual Symposium on Foundations of Computer Science
Measuring Graph Similarity Using Spectral Geometry

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Database Systems: The Complete Book

Database Systems: The Complete Book
A note on the space complexity of some decision problems for finite automata

Information Processing Letters
Reversal-bounded multipushdown machines

Journal of Computer and System Sciences
A solvable class of quadratic Diophantine equations with applications to verification of infinite-state systems

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Simulation-Based graph similarity

TACAS'06 Proceedings of the 12th international conference on Tools and Algorithms for the Construction and Analysis of Systems
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding

IEEE Transactions on Information Theory
Shared information and program plagiarism detection

IEEE Transactions on Information Theory
Universal prediction of individual sequences

IEEE Transactions on Information Theory

Quantified Score

Hi-index	5.23

Visualization

Abstract

We use an information-theoretic notion, namely, (Shannon) information rate, to generalize common syntactic similarity metrics (like Hamming distance and longest common subsequences) between strings to ones between languages. We show that the similarity metrics between two regular languages are computable. We further study self-similarity of a regular language under various similarity metrics. As far as semantic similarity is concerned, we study the amplitude of an automaton, which intuitively characterizes how much a typical execution of the automaton fluctuates. Finally, we investigate, through experiments, how to measure similarity between two real-world programs using Lempel-Ziv compression on the runs at the assembly level.