Searching for complex patterns over large stored information repositories

  • Authors:
  • Nikhil Deshpande;Sharma Chakravarthy;Raman Adaikkalavan

  • Affiliations:
  • CSE Department, The University of Texas at Arlington;CSE Department, The University of Texas at Arlington;CIS Department, Indiana University South Bend

  • Venue:
  • BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although Information Retrieval (IR) systems, including search engines, have been effective in locating documents that contain specified patterns from large repositories, they support only keyword searches and queries/patterns that use Boolean operators. Expressive search for complex text patterns is important in many domains such as patent search, search on incoming news, and web repositories. In this paper, we first present the operators and their semantics for specifying an expressive search. We then investigate the detection of complex patterns - currently not supported by search engines - using a pre-computed index, and the type of information needed as part of the index to efficiently detect such complex patterns. We use an expressive pattern specification language and a pattern detection graph mechanism that allows sharing of common sub-patterns. Algorithms have been developed for all the pattern operators using the index to detect complex patterns efficiently. Experiments have been performed to illustrate the scalability of the proposed approach, and its efficiency as compared to a streaming approach.