A multidimensional scaling approach for representing XML documents

  • Authors:
  • Zhonghang Xia;Gugangming Xing;Qi Li

  • Affiliations:
  • Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY

  • Venue:
  • ACM-SE 45 Proceedings of the 45th annual southeast regional conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

It has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency. Unlike the flat text document, the Web document has no standard vectorial representation, which is required in most existing classification algorithms. In this paper, we propose a vectorization method for XML documents by using multidimensional scaling (MDS) so that Web documents can be fed into an existing classification algorithm. The classical MDS embeds data points into an Euclidean space if the similarity matrix constructed by the data points is semidefinite. The semidefniteness condition, however, may not hold due to the inference technique used in practice. We will find a semi-definite matrix which is the closest to the distance matrix in the Euclidean space. Based on recent developments on strongly semismooth matrix valued functions, we solve the nearest semi-definite matrix problem with a Newton-type method. Experimental studies show that the classification accuracy can be improved.