XML and information retrieval: a SIGIR 2000 workshop

Authors:
David Carmel;Yoelle Maarek;Aya Soffer
Affiliations:
IBM Research Lab in Haifa;IBM Research Lab in Haifa;IBM Research Lab in Haifa
Venue:
ACM SIGMOD Record
Year:
2001

Citing 0
Cited 1

A survey in indexing and searching XML documents

Journal of the American Society for Information Science and Technology - XML

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML - the eXtensible Markup Language has recently emerged as a new standard for data representation and exchange on the Interact. It is believed that it will become a universal format for data exchange on the Web and that in the near future we will find vast amounts of documents in XML format on the Web. As a result, it has become crucial to address the question of how large collections of XML documents can be sorted and retrieved efficiently and effectively.To date, most work on storing, indexing, querying, and searching documents in XML has stemmed from the database community's work on semi-structured data. An alternative approach, that has received less attention to date, is to view XML documents as a collection of text documents with additional tags and relations between these tags. IR techniques have traditionally been applied to search large sets of textual data and should thus be extended to encode the structure and semantics inherent in XML documents. Integrating IR and XML search techniques will enable more sophisticated search on the structure as well as the content of these documents, while leveraging the success of IR techniques in document similarity ranking and keyword search.The SIGIR workshop on XML and information retrieval was held July 28th, in Athens Greece. The goal of the workshop was to bring together researchers and practitioners interested in XML and IR to discuss and define the most relevant topics in the relation between these two technologies, present recent results, and propose future directions for research. The topics for discussion included:• How to extend IR technologies to search XML documents• How to integrate XML structure in IR indexing structures• How to query XML documents both on content and structure• How to introduce the semantics inherent in XML into the search process• How to adopt database indexing techniques in an IR frameworkThe opening session of the workshop consisted of a survey of search engines for XML documents. This was followed by three technical sessions: query languages, retrieval algorithms, and IR systems for XML documents. The final talk of the day, "Searching Annotated Language Resources in XML", by Nancy Ide was given from the perspective of potential users of XML search systems and opened many topics for discussion. The workshop was concluded with a panel discussion where the panelists outlined their vision of the future of XML search.