The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Introduction to algorithms
Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The TSIMMIS Approach to Mediation: Data Models and Languages
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
A graph distance metric based on the maximal common subgraph
Pattern Recognition Letters
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The String-to-String Correction Problem
Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
XIRQL: a query language for information retrieval in XML documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Statistical synopses for graph-structured XML databases
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XTRACT: Learning Document Type Descriptors from XML Document Collections
Data Mining and Knowledge Discovery
Object Exchange Across Heterogeneous Information Sources
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Relational Databases for Querying XML Documents: Limitations and Opportunities
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Clustering Algorithms and Validity Measures
SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Configurable indexing and ranking for XML information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
Focused Access to XML Documents
Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG
Journal on Data Semantics XI
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Extensible User-Based XML Grammar Matching
ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Semantic Structural Similarity Measure for Clustering XML Documents
WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Semantic-based Merging of RSS Items
World Wide Web
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas
Information Sciences: an International Journal
Evaluate structure similarity in XML documents with merge-edit-distance
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Structural similarity evaluation between XML documents and DTDs
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Clustering XML documents based on structural similarity
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A fine-grained XML structural comparison approach
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Similarity computation for XML documents by XML element sequence patterns
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Semantics-guided clustering of heterogeneous XML schemas
Journal on data semantics IX
Improving XML search by generating and utilizing informative result snippets
ACM Transactions on Database Systems (TODS)
Measuring tree similarity for natural language processing based information retrieval
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
A bounded distance metric for comparing tree structure
Information Systems
Highly efficient algorithms for structural clustering of large websites
Proceedings of the 20th international conference on World wide web
XCDL: an XML-oriented visual composition definition language
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Multimedia metadata mapping: towards helping developers in their integration task
Proceedings of the 8th International Conference on Advances in Mobile Computing and Multimedia
XML data clustering: An overview
ACM Computing Surveys (CSUR)
XStreamCluster: an efficient algorithm for streaming XML data clustering
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Ingredients for accurate, fast, and robust XML similarity joins
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
WSEAS Transactions on Computers
RTED: a robust algorithm for the tree edit distance
Proceedings of the VLDB Endowment
Automatic generation of semantic fields for resource discovery in the semantic web
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Web Semantics: Science, Services and Agents on the World Wide Web
NaviMoz: mining navigational patterns in portal catalogs
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Discovering multi terms and co-hyponymy from XHTML documents with XTREEM
KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
XML document clustering using structure-preserving flat representation of XML content and structure
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
XML information retrieval through tree edit distance and structural summaries
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Enriching domain-specific language models using domain independent WWW n-gram corpus
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Minimizing user effort in XML grammar matching
Information Sciences: an International Journal
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
Structural similarity evaluation of XML documents based on basic statistics
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Using XML-Based Multicasting to Improve Web Service Scalability
International Journal of Web Services Research
A visual programming language for XML manipulation
Journal of Visual Languages and Computing
Information Systems
On repairing structural problems in semi-structured data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of clustering methods with distances that estimate the similarity between tree structures. This paper presents a framework for clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, we study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. We suggest the usage of structural summaries for trees to improve the performance of the distance calculation and at the same time to maintain or even improve its quality. Our approach is tested using a prototype testbed.