Effective retrieval of structured documents
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
W3QS: A Query System for the World-Wide Web
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Focused Access to XML Documents
XFIRM at INEX 2005: ad-hoc and relevance feedback tracks
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Hi-index | 0.00 |
Different approaches have been used to represent textual documents, based on boolean model, vector space model or probabilistic models. In text mining as in information retrieval (IR), these models have shown good results about textual documents modeling. They nevertheless do not take into account documents structure. In many applications however, documents are inherently structured (e.g. XML documents).In this article, we propose an extended probabilistic representation of documents in order to take into account a certain kind of structural information: logical tags that represent the different parts of the document and formatting tags used to emphasized text. Our approach includes a learning step that estimates the weight of each tag. This weight is related to the probability for a given tag to distinguish the relevant terms.