Implementation of a high-speed and high-precision XML information retrieval system on relational databases

Authors:
Kei Fujimoto;Toshiyuki Shimizu;Norimasa Terada;Kenji Hatano;Yu Suzuki;Toshiyuki Amagasa;Hiroko Kinutani;Masatoshi Yoshikawa
Affiliations:
Graduate School of Information Science, Nagoya University, Nagoya, Japan;Graduate School of Information Science, Nagoya University, Nagoya, Japan;Graduate School of Information Science, Nagoya University, Nagoya, Japan;Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Japan;College of Information Science and Technology, Ritsumeikan University, Kusatsu, Japan;Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan;Information Media and Education Square, Ochanomizu University, Bunkyo, Japan;Graduate School of Information Science, Nagoya University, Nagoya, Japan
Venue:
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Year:
2005

Citing 4
Cited 1

XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
Texquery: a full-text search extension to xquery

Proceedings of the 13th international conference on World Wide Web
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Analyzing the properties of XML fragments decomposed from the INEX document collection

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval

Kikori-KS: an effective and efficient keyword search system for digital libraries in XML

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an XML information retrieval system that we have developed. It is based on a vector space model, and implemented on top of XRel, a relational XML database system that has been developed in our research group. When a query is processed, a large number of fragments are retrieved, because a single XML document usually contains many XML fragments. Keeping all XML fragments degrades retrieval precision and increases query processing time, because some XML fragments are not appropriate as a query target. In existing methods, retrieval targets are manually selected by human experts when an XML collection is stored in the system. Such manual selection is not feasible when many kinds of XML documents are stored in the system. To cope with the problem we propose a method for automatically selecting document-centric fragments by introducing three measurements, namely, period ratio, number of different words, and empirical rules. By deleting inappropriate data-centric fragments from results of keyword query, we can improve the accuracy and performance of our system. Through performance evaluations, we confirmed the improvement of retrieval precision and query processing speed.