Phil: A Lazy Implementation of a Language for Approximate Filtering of XML Documents

Authors:
M. Baggi;D. Ballis
Affiliations:
Dip. Matematica e Informatica, Via delle Scienze 206, 33100 Udine, Italy;Dip. Matematica e Informatica, Via delle Scienze 206, 33100 Udine, Italy
Venue:
Electronic Notes in Theoretical Computer Science (ENTCS)
Year:
2008

Citing 6
Cited 0

Querying and ranking XML documents

Journal of the American Society for Information Science and Technology - XML
Operational and abstract semantics of the query language G-Log

Theoretical Computer Science
XPath-logic and XPathLog: A logic-programming style XML data manipulation language

Theory and Practice of Logic Programming
Rule-based verification of Web sites

International Journal on Software Tools for Technology Transfer (STTT)
Filtering of XML Documents

WWV '06 Proceedings of the 2nd Int'l. Workshop on Automated Specification and Verification of Web Systems
Phrase Matching in XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a system, written in Haskell, for filtering information from XML data. Essentially, the system implements a simple declarative language which allows one to extract relevant data as well as to exclude useless and misleading contents from an XML document by matching patterns against XML documents. The matching mechanism employes a cost-based pattern transformation algorithm which searches for patterns in an approximate way (i.e. modulo renaming, insertion, and deletion of XML items) and ranks the results w.r.t. their cost. In order to improve efficiency, the implementation uses sophisticated indexing techniques and exploits laziness to automatically avoid the construction of unnecessary data structures. We analyzed both the expressiveness of our filtering language and the performance of the system using the well known XMark benchmark suite.