Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Approaches to passage retrieval in full text information systems
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Properties of extended Boolean models in information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval of structured documents
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Proximal nodes: a model to query document databases by content and structure
ACM Transactions on Information Systems (TOIS)
World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
Extraction of Partial XML Documents Using IR-Based Structure and Contents Analysis
Revised Papers from the HUMACS, DASWIS, ECOMO, and DAMA on ER 2001 Workshops
Analyzing the Effectiveness of Extended Boolean Models in Information Retrieval
Analyzing the Effectiveness of Extended Boolean Models in Information Retrieval
Relevance measures for XML information retrieval
International Journal of Web and Grid Services
Vector retrieval model for XML document based on dynamic partition of information units
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Analyzing the properties of XML fragments decomposed from the INEX document collection
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Hi-index | 0.00 |
In the research field of document information retrieval, the unit of retrieval results returned by IR systems is a whole document or a document fragment, like a paragraph in passage retrieval. IR systems based on the vector space model compute feature vectors of the units and calculate the similarities between the units and the query. However, the unit of retrieval results are not suitable for document information retrieval since they are not congruent with the information which users are searching for. Therefore, the unit of retrieval results should be a portion of the XML document, such as a chapter, section, or subsection. That is, we think the most important concern of document information retrieval is to define the unit of retrieval results, that is meaningful for users. It is easy to construct the appropriate portion of XML documents as retrieval results because XML is a standard document format on the Internet and because XML documents consist of contents and document structures. In this paper, we propose an effective IR system for XML documents that automatically defines an appropriate unit of retrieval results by analyzing the XML document structure. We performed experimental evaluations and verified the effectiveness of our XML IR system. In addition, we also defined new recall and precision measures for XML information retrieval in order to evaluate our XML IR system.