Calculation of document similarity using cellular structured space template

Authors:
Pizzanu Kanongchaiyos
Affiliations:
Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Thailand
Venue:
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Year:
2007

Citing 9
Cited 0

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
The Unified Modeling Language user guide

The Unified Modeling Language user guide
A classifier for semi-structured documents

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Designing Concurrent, Distributed, and Real-Time Applications with Uml

Designing Concurrent, Distributed, and Real-Time Applications with Uml
Reuse of Scenario Specifications Using an Automated Relational Learner: A Lightweight Approach

RE '02 Proceedings of the 10th Anniversary IEEE Joint International Conference on Requirements Engineering
Reusing UML Specifications in a Constrained Application Domain

APSEC '98 Proceedings of the Fifth Asia Pacific Software Engineering Conference
Reusing Use Case Descriptions for Requirements Specification: Towards Use Case Patterns

APSEC '99 Proceedings of the Sixth Asia Pacific Software Engineering Conference
Patterns and Aspects for Use Cases: Reuse Techniques for Use Case Descriptions

ICRE '00 Proceedings of the 4th International Conference on Requirements Engineering (ICRE'00)
Algebraic Topological Modeling for Cyberworld Design

CW '03 Proceedings of the 2003 International Conference on Cyberworlds

Quantified Score

Hi-index	0.00

Visualization

Abstract

Calculation of similarity between corresponding documents becomes a major task in information retrieval from a textual database (e.g., electronic books or electronic dictionaries). The comparison between documents can be conducted by constructing associative feature vectors or set of terms and computing distance between the corresponding vectors or sets. While Boolean distance seems not practical and set similarity cannot handle with the case that some terms are more effective in retrieval than others, statistics of terms in documents is recognized as a good for computing document relevance. However, the efficiency of the calculation is based on only the size of the statistical data while the documents discourse or additional meaning from the structure of text is not considered. In this research, cellular structured space templates are used for building input documents. The concept of the cellular structured space template for specifying the basic layout and semantics of the document is a reasonable compromised between time-consuming manual document retyping process and unavailable totally automated document recognition process. Semantics-based similarity between documents is computed attached calculation of cellular structured vectors which are n-dimensional context vectors of the documents. The experimental result shows the improvement of similarity between relevance documents compared with the normal retrieval methods.