XML version detection

Authors:
Deise de Brum Saccol;Nina Edelweiss;Renata de Matos Galante;Carlo Zaniolo
Affiliations:
Universidade Federal do Rio Grande do Sul;Universidade Federal do Rio Grande do Sul;Universidade Federal do Rio Grande do Sul;University of California
Venue:
Proceedings of the 2007 ACM symposium on Document engineering
Year:
2007

Citing 23
Cited 4

A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
XML document versioning

ACM SIGMOD Record
A Layered Architecture for Uniform Version Management

IEEE Transactions on Software Engineering
Managing Change in a Computer-Aided Design Database

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient schemes for managing multiversionXML documents

The VLDB Journal — The International Journal on Very Large Data Bases
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Classification of Web Documents Using a Naive Bayes Method

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Information Systems - Special issue on web data integration
Efficient similarity-based operations for data integration

Data & Knowledge Engineering
Supporting Branched Versions on XML Documents

RIDE '04 Proceedings of the 14th International Workshop on Research Issues on Data Engineering: Web Services for E-Commerce and E-Government Applications (RIDE'04)
Measuring similarity between collection of values

Proceedings of the 6th annual ACM international workshop on Web information and data management
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
Towards XML version control of office documents

Proceedings of the 2005 ACM symposium on Document engineering
On the effectiveness of clone detection by string matching: Research Articles

Journal of Software Maintenance and Evolution: Research and Practice
Using proportional transportation similarity with learned element semantics for XML document clustering

Proceedings of the 15th international conference on World Wide Web
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Detecting, Managing and Querying Replicas and Versions in a Peer-to-Peer Environment

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Estimating recall and precision for vague queries in databases

CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Shared information and program plagiarism detection

IEEE Transactions on Information Theory

Merging changes in XML documents using reliable context fingerprints

Proceedings of the eighth ACM symposium on Document engineering
WSDL and UDDI extensions for version support in web services

Journal of Systems and Software
Automatic identification of ontology versions using machine learning techniques

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Temporal and multi-versioned XML documents: A survey

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of version detection is critical in many important application scenarios, including software clone identification, Web page ranking, plagiarism detection, and peer-to-peer searching. A natural and commonly used approach to version detection relies on analyzing the similarity between files. Most of the techniques proposed so far rely on the use of hard thresholds for similarity measures. However, defining a threshold value is problematic for several reasons: in particular (i) the threshold value is not the same when considering different similarity functions, and (ii) it is not semantically meaningful for the user. To overcome this problem, our work proposes a version detection mechanism for XML documents based on Naïve Bayesian classifiers. Thus, our approach turns the detection problem into a classification problem. In this paper, we present the results of various experiments on synthetic data that show that our approach produces very good results, both in terms of recall and precision measures.