A multidimensional scaling approach for representing XML documents

Authors:
Zhonghang Xia;Gugangming Xing;Qi Li
Affiliations:
Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY
Venue:
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Year:
2007

Citing 13
Cited 0

Approximately matching context-free languages

Information Processing Letters
XTRACT: a system for extracting document type descriptors from XML documents

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
New algorithm for ordered tree-to-tree correction problem

Journal of Algorithms
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
A Dual Approach to Semidefinite Least-Squares Problems

SIAM Journal on Matrix Analysis and Applications
GE-CKO: A Method to Optimize Composite Kernels for Web Page Classification

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Approximate XML document matching

Proceedings of the 2005 ACM symposium on Applied computing
Least-Squares Covariance Matrix Adjustment

SIAM Journal on Matrix Analysis and Applications
A Quadratically Convergent Newton Method for Computing the Nearest Correlation Matrix

SIAM Journal on Matrix Analysis and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency. Unlike the flat text document, the Web document has no standard vectorial representation, which is required in most existing classification algorithms. In this paper, we propose a vectorization method for XML documents by using multidimensional scaling (MDS) so that Web documents can be fed into an existing classification algorithm. The classical MDS embeds data points into an Euclidean space if the similarity matrix constructed by the data points is semidefinite. The semidefniteness condition, however, may not hold due to the inference technique used in practice. We will find a semi-definite matrix which is the closest to the distance matrix in the Euclidean space. Based on recent developments on strongly semismooth matrix valued functions, we solve the nearest semi-definite matrix problem with a Newton-type method. Experimental studies show that the classification accuracy can be improved.