Performance comparison of extendible hashing and linear hashing techniques
SIGSMALL '90 Proceedings of the 1990 ACM SIGSMALL/PC symposium on Small systems
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Cache conscious programming in undergraduate computer science
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Performance analysis of linear hashing with partial expansions
ACM Transactions on Database Systems (TODS)
Hashing Schemes for Extendible Arrays
Journal of the ACM (JACM)
Analysis of the Search Performance of Coalesced Hashing
Journal of the ACM (JACM)
Techniques for collision resolution in hash tables with open addressing
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Communications of the ACM
Self-adjusting trees in practice for large text collections
Software—Practice & Experience
In-memory hash tables for accumulating text vocabularies
Information Processing Letters
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Performance in Practice of String Hashing Functions
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious sorting of large sets of strings with dynamic tries
Journal of Experimental Algorithmics (JEA)
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
HAT-trie: a cache-conscious trie-based data structure for strings
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
B-tries for disk-based string management
The VLDB Journal — The International Journal on Very Large Data Bases
Dictionary-based order-preserving string compression for main memory column stores
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A novel packet header visualization methodology for network anomaly detection
CNIS '07 Proceedings of the Fourth IASTED International Conference on Communication, Network and Information Security
Engineering burstsort: Toward fast in-place string sorting
Journal of Experimental Algorithmics (JEA)
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Fast and compact hash tables for integer keys
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Software-based implementations of updateable data structures for high-speed URL matching
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Minimal perfect hashing: A competitive method for indexing internal memory
Information Sciences: an International Journal
The case of the duplicate documents measurement, search, and science
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
A morphological analyzer using hash tables in main memory (MAHT) and a lexical knowledge base
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In-memory hash tables provide fast access to large numbers of strings, with less space overhead than sorted structures such as tries and binary trees. If chains are used for collision resolution, hash tables scale well, particularly if the pattern of access to the stored strings is skew. However, typical implementations of string hash tables, with lists of nodes, are not cache-efficient. In this paper we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. In all cases, the new structures give substantial savings in space at no cost in time. In the best case, the overhead space required for pointers is reduced by a factor of around 50, to less than two bits per string (with total space required, including 5.68 megabytes of strings, falling from 20.42 megabytes to 5.81 megabytes), while access times are also reduced.