Utilizing sub-topical structure of documents for information retrieval

Authors:
Debasis Ganguly;Johannes Leveling;Gareth J.F. Jones
Affiliations:
Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland
Venue:
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Year:
2011

Citing 18
Cited 1

Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Improving automatic query expansion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

A language modeling approach to information retrieval
Applying summarization techniques for term selection in relevance feedback

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Statistical models for topic segmentation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Minimum cut model for spoken lecture segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Overview of the 2009 QA track: towards a common task for QA, focused IR and automatic summarization systems

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Exploring structured documents and query formulation techniques for patent retrieval

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Simple vs. sophisticated approaches for patent prior-art search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Query expansion for language modeling using sentence similarities

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Overview of the INEX 2010 question answering track (QA@INEX)

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Simulation of within-session query variations using a text segmentation approach

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Patent query reduction using pseudo relevance feedback

Proceedings of the 20th ACM international conference on Information and knowledge management
United we fall, divided we stand: a study of query segmentation and prf for patent prior art search

Proceedings of the 4th workshop on Patent information retrieval

PIKM 2011: the 4th ACM workshop for Ph.D. students in information and knowledge management

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad-hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document.