Searching for information in a hypertext medical handbook
Communications of the ACM
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval of structured documents
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Document and passage retrieval based on hidden Markov models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition and structuring
Information Processing and Management: an International Journal
Index structures for structured documents
Proceedings of the first ACM international conference on Digital libraries
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Lore: a database management system for semistructured data
ACM SIGMOD Record
A flexible model for retrieval of SGML documents
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient passage ranking for document databases
ACM Transactions on Information Systems (TOIS)
Hierarchical indexing and document matching in BoW
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Integrating contents and structure in text retrieval
ACM SIGMOD Record
XIRQL: a query language for information retrieval in XML documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Construction of a test collection for the focussed retrieval of structured documents
ECIR'03 Proceedings of the 25th European conference on IR research
An overview of web data clustering practices
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Analyzing the properties of XML fragments decomposed from the INEX document collection
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Hi-index | 0.00 |
As more and more structured documents, such as the SGML or XML documents, become available on the Web, there is a growing demand to develop effective structured document retrieval which exploits both content and hierarchical structure of documents and return document elements with appropriate granularity. Previous work on partial retrieval of structured document has limited applications due to the requirement of structured queries and restriction that the document structure cannot be traversed according to queries. In this paper, we put forward a method for flexible element retrieval which can retrieve relevant document elements with arbitrary granularity against natural language queries. The proposed techniques constitute a novel hierarchical index propagation and pruning mechanism and an algorithm of ranking document elements based on the hierarchical index. The experimental results show that our method significantly outperforms other existing methods. Our method also shows robustness to the long-standing problems of text length normalization and threshold setting in structured document retrieval.