Evaluation of an inference network-based retrieval model
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
A vector space model for automatic indexing
Communications of the ACM
InfoFilter: a system for expressive pattern specification and detection over text streams
Proceedings of the 2005 ACM symposium on Applied computing
Hi-index | 0.00 |
Although Information Retrieval (IR) systems, including search engines, have been effective in locating documents that contain specified patterns from large repositories, they support only keyword searches and queries/patterns that use Boolean operators. Expressive search for complex text patterns is important in many domains such as patent search, search on incoming news, and web repositories. In this paper, we first present the operators and their semantics for specifying an expressive search. We then investigate the detection of complex patterns - currently not supported by search engines - using a pre-computed index, and the type of information needed as part of the index to efficiently detect such complex patterns. We use an expressive pattern specification language and a pattern detection graph mechanism that allows sharing of common sub-patterns. Algorithms have been developed for all the pattern operators using the index to detect complex patterns efficiently. Experiments have been performed to illustrate the scalability of the proposed approach, and its efficiency as compared to a streaming approach.