Summarizing highly structured documents for effective search interaction

  • Authors:
  • Lanbo Zhang;Yi Zhang;Yunfei Chen

  • Affiliations:
  • School of Engineering, UC Santa Cruz, Santa Cruz, CA, USA;School of Engineering, UC Santa Cruz, Santa Cruz, CA, USA;School of Engineering, UC Santa Cruz, Santa Cruz, CA, USA

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As highly structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Unfortunately existing work on document summarization, especially in the context of search, has been mainly focused on unstructured documents, and little attention has been paid to highly structured documents. Due to the different characteristics of structured and unstructured documents, the ideal approaches for document summarization might be different. In this paper, we study the problem of summarizing highly structured documents in a search context. We propose a new summarization approach based on query-specific facet selection. Our approach aims to discover the important facets hidden behind a query using a machine learning approach, and summarizes retrieved documents based on those important facets. In addition, we propose to evaluate summarization approaches based on a utility function that measures how well the summaries assist users in interacting with the search results. Furthermore, we develop a game on Mechanical Turk to evaluate different summarization approaches. The experimental results show that the new summarization approach significantly outperforms two existing ones.