Focused retrieval and result aggregation with political data

Authors:
Rianne Kaptein;Maarten Marx
Affiliations:
Archives and Information Studies, University of Amsterdam, Amsterdam, The Netherlands;ISLA, University of Amsterdam, Amsterdam, The Netherlands
Venue:
Information Retrieval
Year:
2010

Citing 23
Cited 3

Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Searcher performance in question answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding the flow in web site search

Communications of the ACM
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Topic detection and tracking: event-based information organization

Topic detection and tracking: event-based information organization
Visualizing argumentation: software tools for collaborative and educational sense-making

Visualizing argumentation: software tools for collaborative and educational sense-making
The roots of computer supported argument visualization

Visualizing argumentation
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Report on the INEX 2004 interactive track

ACM SIGIR Forum
MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Managing information extraction: state of the art and research directions

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Articulating information needs in XML query languages

ACM Transactions on Information Systems (TOIS)
Tag clouds for summarizing web search results

Proceedings of the 16th international conference on World Wide Web
Evaluating XML retrieval effectiveness at INEX

ACM SIGIR Forum
Generating summary keywords for emails using topics

Proceedings of the 13th international conference on Intelligent user interfaces
Data clouds: summarizing keyword search results over structured data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Digital weight watching: reconstruction of scanned documents

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Who said what to whom?: capturing the structure of debates

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Comparing corpora using frequency profiling

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Narrowed extended XPath i (NEXI)

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval

Structuring political documents for importance ranking

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Browsing interaction events in recordings of small group activities via multimedia operators

Proceedings of the 18th Brazilian symposium on Multimedia and the web
Aggregated search: A new information retrieval paradigm

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a case-study in which we use a large semi-structured data set consisting of official transcripts of meetings of the Dutch parliament for focused retrieval and result aggregation. Transcripts of meetings are a document genre characterized by a complex narrative structure. The essence is not only what is said, but also by who and to whom. We have notes of more than 40 years of Dutch parliamentary debates where this structure is exploited to automatically make semantic annotations. These annotations yield numerous new ways of searching, browsing, mining and summarizing these documents. Concerning result aggregation, we summarise and visualise the structure of meetings into tables of content and interruption graphs. The contents of meetings or parts of meetings are condensed into word clouds that are created using a parsimonious language model. Furthermore, we have developed a search engine that exploits the structure and annotations of our data making it possible to provide entry points, to group search results, and to use faceted search techniques for data-exploration. Evaluation shows that our content and structure summarization tools provide a good first impression of a debate. Users reported that, compared to a standard document retrieval system, our search engine gives a better overview of the data. Search tasks are performed faster and the users felt more certain of their answers.