Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Approximately matching context-free languages
Information Processing Letters
The String-to-String Correction Problem
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Fast context-free grammar parsing requires fast boolean matrix multiplication
Journal of the ACM (JACM)
Validating streaming XML documents
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Finding an optimum edit script between an XML document and a DTD
Proceedings of the 2005 ACM symposium on Applied computing
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Inferring XML schema definitions from XML data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A methodology for clustering XML documents by structure
Information Systems
Recognizing well-parenthesized expressions in the streaming model
Proceedings of the forty-second ACM symposium on Theory of computing
Sampling the repairs of functional dependency violations under hard constraints
Proceedings of the VLDB Endowment
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Predicting at-risk novice Java programmers through the analysis of online protocols
Proceedings of the seventh international workshop on Computing education research
Proceedings of the 20th ACM international conference on Information and knowledge management
Constant-memory validation of streaming XML documents against DTDs
ICDT'07 Proceedings of the 11th international conference on Database Theory
RTED: a robust algorithm for the tree edit distance
Proceedings of the VLDB Endowment
Validity-sensitive querying of XML databases
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Automated repair of HTML generation errors in PHP applications using string constraint solving
Proceedings of the 34th International Conference on Software Engineering
Hi-index | 0.00 |
Semi-structured data such as XML are popular for data interchange and storage. However, many XML documents have improper nesting where open - and close-tags are unmatched. Since some semi-structured data (e.g., Latex) have a flexible grammar and since many XML documents lack an accompanying DTD or XSD, we focus on computing a syntactic repair via the edit distance. To solve this problem, we propose a dynamic programming algorithm which takes cubic time. While this algorithm is not scalable, well-formed substrings of the data can be pruned to enable faster computation. Unfortunately, there are still cases where the dynamic program could be very expensive; hence, we give branch-and-bound algorithms based on various combinations of two heuristics, called MinCost and MaxBenefit, that trade off between accuracy and efficiency. Finally, we experimentally demonstrate the performance of these algorithms on real data.