Searching XML documents via XML fragments

Authors:
David Carmel;Yoelle S. Maarek;Matan Mandelbrod;Yosi Mass;Aya Soffer
Affiliations:
IBM Research Lab in Haifa, Mount Carmel, Haifa;IBM Research Lab in Haifa, Mount Carmel, Haifa;IBM Research Lab in Haifa, Mount Carmel, Haifa;IBM Research Lab in Haifa, Mount Carmel, Haifa;IBM Research Lab in Haifa, Mount Carmel, Haifa
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 8
Cited 80

Approaches to passage retrieval in full text information systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
XML and information retrieval: a SIGIR 2000 workshop

ACM SIGIR Forum
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
XQL and proximal nodes

Journal of the American Society for Information Science and Technology - XML
A novel navigation paradigm for XML repositories

Journal of the American Society for Information Science and Technology - XML
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A taxonomy of web search

ACM SIGIR Forum

XIRQL: An XML query language based on information retrieval concepts

ACM Transactions on Information Systems (TOIS)
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Length normalization in XML retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Configurable indexing and ranking for XML information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Processing content-oriented XPath queries

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Approximating the top-m passages in a parallel question answering system

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Measuring similarity between collection of values

Proceedings of the 6th annual ACM international workshop on Web information and data management
Building an example application with the unstructured information management architecture

IBM Systems Journal
Text analytics for life science using the unstructured information management architecture

IBM Systems Journal
The Importance of Length Normalization for XML Retrieval

Information Retrieval
Magnet: supporting navigation in semistructured data environments

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Controlling overlap in content-oriented XML retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Database-inspired search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
XML full-text search: challenges and opportunities

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Analytical processing of XML documents: opportunities and challenges

ACM SIGMOD Record
Structured queries in XML retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
Report on the DB/IR panel at SIGMOD 2005

ACM SIGMOD Record
A methodology for clustering XML documents by structure

Information Systems
Flexible and efficient XML search with complex full-text predicates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Avatar semantic search: a database approach to information retrieval

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Semantic search via XML fragments: a high-precision approach to IR

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring similarity of semi-structured documents with context weights

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
XQuery full-text extensions explained

IBM Systems Journal
Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems (TODS)
Articulating information needs in XML query languages

ACM Transactions on Information Systems (TOIS)
Dynamic element retrieval in a structured environment

ACM Transactions on Information Systems (TOIS)
Preparing heterogeneous XML for full-text search

ACM Transactions on Information Systems (TOIS)
A co-training framework for searching XML documents

Information Systems
XML search: languages, INEX and scoring

ACM SIGMOD Record
A kernel based structure matching for web services search

Proceedings of the 16th international conference on World Wide Web
Making database systems usable

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Query relaxation using malleable schemas

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Structured retrieval for question answering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
ESTER: efficient search on text, entities, and relations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Open-domain question: answering

Foundations and Trends in Information Retrieval
An architecture for xml information retrieval in a peer-to-peer environment

Proceedings of the ACM first Ph.D. workshop in CIKM
An experimental study of the impact of information extraction accuracy on semantic search performance

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Multi-dimensional search for personal information management systems

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Enabling Schema-Free XQuery with meaningful query focus

The VLDB Journal — The International Journal on Very Large Data Bases
Using a mediated query approach for matching unstructured query with structured resources

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Structural relevance: a common basis for the evaluation of structured document retrieval

Proceedings of the 17th ACM conference on Information and knowledge management
A solution for an unified vision of the enterprise informations

Proceedings of the 2006 conference on Leading the Web in Concurrent Engineering: Next Generation Concurrent Engineering
A cluster-based approach to XML similarity joins

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Querying XML documents with multi-dimensional markup

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
A methodology for clustering XML documents by structure

Information Systems
Outline wizard: presentation composition and search

Proceedings of the 15th international conference on Intelligent user interfaces
Effects of granularity of search results on the relevance judgment behavior of engineers: Building systems for retrieval and understanding of context

Journal of the American Society for Information Science and Technology
Integrating databases, search engines and web applications: a model-driven approach

ICWE'07 Proceedings of the 7th international conference on Web engineering
Towards adaptive information merging using selected XML fragments

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Predicate-based indexing for desktop search

The VLDB Journal — The International Journal on Very Large Data Bases
ListBM: a learning-to-rank method for XML keyword search

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Exploiting semantic tags in XML retrieval

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Achieving high precisions with peer-to-peer is possible!

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Unified access to heterogeneous data in cultural heritage

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
XML fragments extended with database operators

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Unified structure and content search for personal information management systems

Proceedings of the 14th International Conference on Extending Database Technology
Multimedia metadata mapping: towards helping developers in their integration task

Proceedings of the 8th International Conference on Advances in Mobile Computing and Multimedia
MuMIe: a new system for multimedia metadata interoperability

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
A survey on XML keyword search

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
ListOPT: learning to optimize for XML ranking

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Combining strategies for XML retrieval

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Using the INEX environment as a test bed for various user models for XML retrieval

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
SIRIUS: a lightweight XML indexing and approximate search system at INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
No tag, a little nesting, and great XML keyword search

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Feedback-Driven structural query expansion for ranked retrieval of XML data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A query expression and processing technique for an XML search engine

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Encoding XML in vector spaces

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Component ranking and automatic query refinement for XML retrieval

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
MultiText experiments for INEX 2004

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Relevance feedback for XML retrieval

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
An analysis of an efficient data structure for evaluating flexible constraints on XML documents

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Information retrieval of sequential data in heterogeneous XML databases

AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Organic databases

DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
MAXLCA: a new query semantic model for XML keyword search

Journal of Web Engineering
Guess what i want: inferring the semantics of keyword queries using evidence theory

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Locating and ranking XML documents based on content and structure synopses

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
A query transformation framework for automated structured query construction in structured retrieval environment

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the work on XML query and search has stemmed from the publishing and database communities, mostly for the needs of business applications. Recently, the Information Retrieval community began investigating the XML search issue to answer information discovery needs. Following this trend, we present here an approach where information needs can be expressed in an approximate manner as pieces of XML documents or "XML fragments" of the same nature as the documents that are being searched. We present an extension of the vector space model for searching XML collections via XML fragments and ranking results by relevance. We describe how we have extended a full-text search engine to comply with this model. The value of the proposed method is demonstrated by the relative high precision of our system, which was among the top performers in the recent INEX workshop. Our results indicate that certain queries are more appropriate than others for the extended vector space model. Specifically, queries with relatively specific contexts but vague information needs are best situated to reap the benefit of this model. Finally our results show that one method may not fit all types of queries and that it could be worthwhile to use different solutions for different applications.