Knowledge and Information Systems
Discovering Relations Among Entities from XML Documents
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Expert Systems with Applications: An International Journal
Web Semantics: Science, Services and Agents on the World Wide Web
A new sequential mining approach to XML document similarity computation
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
A kernel method for measuring structural similarity between XML documents
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Evaluate structure similarity in XML documents with merge-edit-distance
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Similarity computation for XML documents by XML element sequence patterns
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
WSEAS Transactions on Computers
Finding maximal similar paths between XML documents using sequential patterns
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
A new sequential mining approach to XML document clustering*
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Clustering and retrieval of XML documents by structure
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Using fuzzy cognitive map to effectively classify e-documents and application
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
XMine: a methodology for mining XML structure
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Hi-index | 0.00 |
XML allows users to define elements using arbitrary words and organize them in a nested structure. These features of XML offer both challenges and opportunities in information retrieval, document management, and data mining. In this paper,we propose a new methodology for preparing XML documents for quantitative determination of similarity between XML documents by taking account of XML semantics (i.e.,meanings of the elements andnested structures of XML documents).Accurate quantitative determination of similarity between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that ourmethodology provides a 50-100%improvement in determining similarity, over the traditional vector-space model that considers only term-frequency and 100% accuracy in identifying the category of each document from an on-line bookstore.