A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Modern Information Retrieval
ACM SIGMOD Record
A Layered Architecture for Uniform Version Management
IEEE Transactions on Software Engineering
Managing Change in a Computer-Aided Design Database
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient schemes for managing multiversionXML documents
The VLDB Journal — The International Journal on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Classification of Web Documents Using a Naive Bayes Method
ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Information Systems - Special issue on web data integration
Efficient similarity-based operations for data integration
Data & Knowledge Engineering
Supporting Branched Versions on XML Documents
RIDE '04 Proceedings of the 14th International Workshop on Research Issues on Data Engineering: Web Services for E-Commerce and E-Government Applications (RIDE'04)
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
Towards XML version control of office documents
Proceedings of the 2005 ACM symposium on Document engineering
On the effectiveness of clone detection by string matching: Research Articles
Journal of Software Maintenance and Evolution: Research and Practice
Proceedings of the 15th international conference on World Wide Web
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Detecting, Managing and Querying Replicas and Versions in a Peer-to-Peer Environment
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Estimating recall and precision for vague queries in databases
CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
Merging changes in XML documents using reliable context fingerprints
Proceedings of the eighth ACM symposium on Document engineering
WSDL and UDDI extensions for version support in web services
Journal of Systems and Software
Automatic identification of ontology versions using machine learning techniques
ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Temporal and multi-versioned XML documents: A survey
Information Processing and Management: an International Journal
Hi-index | 0.00 |
The problem of version detection is critical in many important application scenarios, including software clone identification, Web page ranking, plagiarism detection, and peer-to-peer searching. A natural and commonly used approach to version detection relies on analyzing the similarity between files. Most of the techniques proposed so far rely on the use of hard thresholds for similarity measures. However, defining a threshold value is problematic for several reasons: in particular (i) the threshold value is not the same when considering different similarity functions, and (ii) it is not semantically meaningful for the user. To overcome this problem, our work proposes a version detection mechanism for XML documents based on Naïve Bayesian classifiers. Thus, our approach turns the detection problem into a classification problem. In this paper, we present the results of various experiments on synthetic data that show that our approach produces very good results, both in terms of recall and precision measures.