UJM at INEX 2007: Document Model Integrating XML Tags

Authors:
Mathias Géry;Christine Largeron;Franck Thollard
Affiliations:
Hubert Curien Lab, Jean Monnet University, Saint-Étienne, France;Hubert Curien Lab, Jean Monnet University, Saint-Étienne, France;Hubert Curien Lab, Jean Monnet University, Saint-Étienne, France
Venue:
Focused Access to XML Documents
Year:
2008

Citing 5
Cited 0

Effective retrieval of structured documents

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
W3QS: A Query System for the World-Wide Web

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
INEX 2007 Evaluation Measures

Focused Access to XML Documents
XFIRM at INEX 2005: ad-hoc and relevance feedback tracks

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Different approaches have been used to represent textual documents, based on boolean model, vector space model or probabilistic models. In text mining as in information retrieval (IR), these models have shown good results about textual documents modeling. They nevertheless do not take into account documents structure. In many applications however, documents are inherently structured (e.g. XML documents).In this article, we propose an extended probabilistic representation of documents in order to take into account a certain kind of structural information: logical tags that represent the different parts of the document and formatting tags used to emphasized text. Our approach includes a learning step that estimates the weight of each tag. This weight is related to the probability for a given tag to distinguish the relevant terms.