An evaluation of retrieval effectiveness for a full-text document-retrieval system
Communications of the ACM
Another look at automatic text-retrieval systems
Communications of the ACM
Computer networks
Architectural considerations for a new generation of protocols
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A case for caching file objects inside internetworks
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Scalable Internet resource discovery: research problems and approaches
Communications of the ACM
Content routing for distributed information servers
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
The official PGP user's guide
The Harvest information discovery and access system
Computer Networks and ISDN Systems
A trace-driven analysis of the UNIX 4.2 BSD file system
Proceedings of the tenth ACM symposium on Operating systems principles
Information Retrieval: Application Service Definition and Protocol Specification, Z39.50-1995
Information Retrieval: Application Service Definition and Protocol Specification, Z39.50-1995
W3QS: A Query System for the World-Wide Web
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Selected Papers from the Symposium on Conceptual Modeling, Current Issues and Future Directions
An information retrieval system to manage program maintenance reports in a data processing shop
ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
A Quantitative Evaluation of Dissemination-Time Preservation Metadata
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
Indexing file contents is a powerful means of helping users locate documents, software, and other types of data among large repositories. In environments that contain many different types of data, content indexing requires type-specific processing to extract information effectively. We present a model for type-specific, user-customizable information extraction, and a system implementation called Essence. This software structure allows users to associate specialized extraction methods with ordinary files, providing the illusion of an object-oriented file system that encapsulates indexing methods within files. By exploiting the semantics of common file types, Essence generates compact yet representative file summaries that can be used to improve both browsing and indexing in resource discovery systems. Essence can extract information from most of the types of files found in common file systems, including files with nested structure (such as compressed “tar” files). Essence interoperates with a number of different search/index systems (such as WAIS and Glimpse), as part of the Harvest system.