Error-Tolerant Retrieval of Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
XRel: a path-based approach to storage and retrieval of XML documents using relational databases
ACM Transactions on Internet Technology (TOIT)
Accelerating XPath location steps
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A System for Approximate Tree Matching
IEEE Transactions on Knowledge and Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ATreeGrep: Approximate Searching in Unordered Trees
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
The XML benchmark project
YFilter: Efficient and Scalable Filtering of XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Filtering of XML Documents with XPath Expressions
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Light-weight xPath processing of XML stream with deterministic automata
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Comparing Lexicalized Treebank Grammars extracted from Chinese, Korean, and English corpora
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Efficient algorithms for processing XPath queries
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Prefiltering techniques for efficient XML document processing
Proceedings of the 2005 ACM symposium on Document engineering
XML Evolution: a two-phase XML processing model using XML prefiltering techniques
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Building GML-native web-based geographic information systems
Computers & Geosciences
Hi-index | 0.00 |
In natural language processing a huge amount of structured data is constantly used for the extraction and presentation of grammatical structures in sentences. For example the Chinese Treebank corpus developed at the Institute of Information Science Academia Sinica Taiwan is a semantically annotated corpus that has been used to help parse and study Chinese sentences. In this setting users usually use structured tree patterns instead of keywords to query the corpus. In this paper we present an online prototype system that provides exploratory search ability. The system implements two flexible and efficient structural query methods and employs a user-friendly web-based interface. Although the system adopts the XML format to present the corpora and search results it does not use conventional XML query languages. As searching the Chinese Treebank corpora is structural in nature and often deals with structural similarities conventional XML query languages such as XPath and XQuery are inflexible and inefficient. We propose and implement a query algorithm called Parent-Child Relationship Filter (PCRF) which provides flexible and efficient structural search. PCRF is sufficiently flexible to provide several similarity-matching options such as wildcard unordered sibling sub-trees ancestor-descendant matching and their combinations. In addition PCRF supports stream-based matching to help users query their XML documents online. We also present three accelerating rules that achieve a 1.5- to 8-fold performance improvement in query time. Our experiment results show that our method archive a 10- to 1000-fold performance improvement compared to the usual text-based XPath query method.