GLIMPSE: a tool to search through entire file systems

Authors:
Udi Manber;Sun Wu
Affiliations:
Department of Computer Science, University of Arizona, Tucson, AZ;Department of Computer Science, National Chung-Cheng University, Ming-Shong, Chia-Yi, Taiwan
Venue:
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Year:
1994

Citing 4
Cited 41

Access methods for text

ACM Computing Surveys (CSUR) - Annals of discrete mathematics, 24
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)

Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Fast text searching: allowing errors

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval

Adding Compression to Block Addressing Inverted Indexes

Information Retrieval
Querying Compressed Data in Data Warehouses

Information Technology and Management
Approximate String Matching in LDAP Based on Edit Distance

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
OdeFS: A File System Interface to an Object-Oriented Database

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Searching the World Wide Web: Challenges and Partial Solutions

IBERAMIA '98 Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence
An Experimental Evaluation of Hybrid Data Structures for Searching

WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Paging Automata

WIA '98 Revised Papers from the Third International Workshop on Automata Implementation
Indexing Text with Approximate q-Grams

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A New Indexing Method for Approximate String Matching

CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Efficient Update of Indexes for Dynamically Changing Web Documents

World Wide Web
Efficient in-memory extensible inverted file

Information Systems
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms

IEEE Transactions on Computers
EnsemBlue: integrating distributed storage and consumer electronics

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
RACE: A Robust Adaptive Caching Strategy for Buffer Cache

IEEE Transactions on Computers
Documenting and automating collateral evolutions in linux device drivers

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Toward a multi-tier index for information retrieval system

SEPADS'05 Proceedings of the 4th WSEAS International Conference on Software Engineering, Parallel & Distributed Systems
Events and streams: harnessing and unleashing their synergy!

Proceedings of the second international conference on Distributed event-based systems
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
Toward a multi-tier index for information retrieval system

TELE-INFO'05 Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
TinyLex: static n-gram index pruning with perfect recall

Proceedings of the 17th ACM conference on Information and knowledge management
Spyglass: fast, scalable metadata search for large-scale storage systems

FAST '09 Proccedings of the 7th conference on File and storage technologies
FI-based file access predictor

Proceedings of the 47th Annual Southeast Regional Conference
Dynamic storage cache allocation in multi-server architectures

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Adaptive multi-level cache allocation in distributed storage architectures

Proceedings of the 24th ACM International Conference on Supercomputing
Difference engine: harnessing memory redundancy in virtual machines

Communications of the ACM
Difference engine: harnessing memory redundancy in virtual machines

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Agent amplified communication

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
A hardware-accelerated novel IR system

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Finding a needle in Haystack: facebook's photo storage

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Toward a multi-tier index for information retrieval system

ICAI'05/MCBC'05/AMTA'05/MCBE'05 Proceedings of the 6th WSEAS international conference on Automation & information, and 6th WSEAS international conference on mathematics and computers in biology and chemistry, and 6th WSEAS international conference on acoustics and music: theory and applications, and 6th WSEAS international conference on Mathematics and computers in business and economics
Inverted files versus suffix arrays for locating patterns in primary memory

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
String matching with alphabet sampling

Journal of Discrete Algorithms
Delta-FTL: improving SSD lifetime via exploiting content locality

Proceedings of the 7th ACM european conference on Computer Systems
The use of the data dictionary in DBMS based on graphs

Interfaces'96 Proceedings of the 1996 international conference on Interfaces to Databases
Exploiting SIMD instructions in current processors to improve classical string algorithms

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
From research to practice: experiences engineering a production metadata database for a scale out file system

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.02

Visualization

Abstract

GLIMPSE, which stands for GLobal IMPlicit SEarch, provides indexing and query schemes for file systems. The novelty of glimpse is that it uses a very small index - in most cases 2-4% of the size of the text - and still allows very flexible full-text retrieval including Boolean queries, approximate matching (i.e., allowing misspelling), and even searching for regular expressions. In a sense, glimpse extends agrep to entire file systems, while preserving most of its functionality and simplicity. Query times are typically slower than with inverted indexes, but they are still fast enough for many applications. For example, it took 5 seconds of CPU time to find all 19 occurrences of Usenix AND Winter in a file system containing 69MB of text spanning 4300 files. Glimpse is particularly designed for personal information, such as one's own file system. The main characteristic of personal information is that it is non-uniform and includes many types of documents. An information retrieval system for personal information should support many types of queries, flexible interaction, low overhead, and customization, All these are important features of glimpse.