A co-training framework for searching XML documents

Authors:
Wilfred Ng;Ho Lam Lau
Affiliations:
Department of Computer Science, The Hong Kong University of Science and Technology, Hong Kong;Department of Computer Science, The Hong Kong University of Science and Technology, Hong Kong
Venue:
Information Systems
Year:
2007

Citing 17
Cited 0

Optimum polynomial retrieval functions based on the probability ranking principle

ACM Transactions on Information Systems (TOIS)
Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Modern Information Retrieval

Modern Information Retrieval
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Searching XML documents via XML fragments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Texquery: a full-text search extension to xquery

Proceedings of the 13th international conference on World Wide Web
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
XSeq: an indexing infrastructure for tree pattern queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Ctree: a compact tree for indexing XML data

Proceedings of the 6th annual ACM international workshop on Web information and data management
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Mixed mode XML query processing

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the use of XML tagged keywords (or simply key-tags) to search an XML fragment in a collection of XML documents. We present techniques that are able to employ users' evaluations as feedback and then to generate an adaptive ranked list of XML fragments as the search results. First, we extend the vector space model as a basis to search XML fragments. The model examines the relevance between the imposed key-tags and identified fragments in XML documents, and determines the ranked result as an output. Second, in order to deal with the diversified nature of XML documents, we present four XML Rankers (XRs), which have different strengths in terms of similarity, granularity, and ranking features. The XRs are specially tailored to diversified XML documents. We then evaluate the XML search effectiveness and quality for each tailored XR and propose a meta-XML ranker (MXR) comprising the four XRs. The MXR is trained via a machine learning training scheme, which we term the ranking support vector machine (RSVM) in a co-training framework (RSCF). The RSCF takes as input two sets of labelled fragments and feature vectors and then generates as output adaptive rankers in a learning process. We show empirically that, with only a small set of training XML fragments, the RSCF is able to improve after a few iterations in the learning process. Finally, we demonstrate that the RSCF-based MXR is able to bring out the strengths of the underlying XRs in order to adapt the users' perspectives on the returned search results. By using a set of key-tag queries on a variety of XML documents, we show that the precision of the result of the RSCF-based MXR is effective.