Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Shortest-substring retrieval and ranking
ACM Transactions on Information Systems (TOIS)
Plane-sweep algorithms for intersecting geometric figures
Communications of the ACM
One-pass evaluation of region algebra expressions
Information Systems
Hi-index | 0.00 |
Minimal-interval semantics [3] associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents: thus, computing efficiently the antichains obtained by operations such as logic conjunction and disjunction is a basic issue. In this paper we provide the first algorithms for computing such operators that are linear in the number of intervals and logarithmic in the number of input antichains. The space used is linear in the number of antichains. Moreover, the algorithms are lazy — they do not assume random access to the input antichains. These properties make the usage of our algorithms feasible in large-scale web search engines.