An Information Retrieval Approach for Automatically Constructing Software Libraries
IEEE Transactions on Software Engineering
A flexible model for retrieval of SGML documents
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The XML handbook
XIRQL: a query language for information retrieval in XML documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A survey in indexing and searching XML documents
Journal of the American Society for Information Science and Technology - XML
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
HyREX: hyper-media retrieval engine for XML
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document Visualization on Small Displays
MDM '03 Proceedings of the 4th International Conference on Mobile Data Management
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An English Japanese machine translation system of the titles of scientific and engineering papers
COLING '82 Proceedings of the 9th conference on Computational linguistics - Volume 1
The overlap problem in content-oriented XML retrieval evaluation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Configurable indexing and ranking for XML information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient and versatile query engine for TopX search
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Score region algebra: building a transparent XML-R database
Proceedings of the 14th ACM international conference on Information and knowledge management
Evaluation in (XML) information retrieval: expected precision-recall with user modelling (EPRUM)
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
EXTIRP 2004: towards heterogeneity
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Phrase Detection in the Wikipedia
Focused Access to XML Documents
Hi-index | 0.00 |
XML retrieval is facing new challenges when applied to heterogeneous XML documents, where next to nothing about the document structure can be taken for granted. We have developed solutions where some of the heterogeneity issues are addressed. Our fragment selection algorithm selectively divides a heterogeneous document collection into equi-sized fragments with full-text content. If the content is considered too data-oriented, it is not accepted. The algorithm needs no information about element names. In addition, three techniques for fragment expansion are presented, all of which yield a 13--17% average improvement in average precision. These techniques and algorithms are among the first steps in developing document-type-independent indexing methods for the full text in heterogeneous XML collections.