Single document semantic spaces

Authors:
Jorge Villalon;Rafael A. Calvo
Affiliations:
The University of Sydney;The University of Sydney
Venue:
AusDM '09 Proceedings of the Eighth Australasian Data Mining Conference - Volume 101
Year:
2009

Citing 6
Cited 0

Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays

IEEE Intelligent Systems
Two uses of anaphora resolution in summarization

Information Processing and Management: an International Journal
Glosser: Enhanced Feedback for Student Writing Tasks

ICALT '08 Proceedings of the 2008 Eighth IEEE International Conference on Advanced Learning Technologies
Concept Map Mining: A Definition and a Framework for Its Evaluation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Improving quality of search results clustering with approximate matrix factorisations

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent Semantic Analysis (LSA) has been successfully used in a number of information retrieval, document visualization and summarization applications. LSA semantic spaces are normally created from large corpora that reflect an assumed background knowledge. However the right size and coverage of the background knowledge for each application are still open research questions. Moreover, LSA's computational cost is directly related to the size of the corpus, making the technique inviable in many cases. This paper introduces a technique for creating semantic spaces using a single document and no background knowledge, which cuts computational cost and is domain independent. Single document semantic spaces' reliability was evaluated on a collection of student essays. Several semantic spaces generated from large corpora and single documents were used to compare how essays are represented. The distance between consecutive sentences in the essays changes between semantic spaces, but the rank of the distances is preserved. The results show that high correlations (0.7) of ranked distances between sentences can be achieved on the different spaces for the weight schemes evaluated. This has important implications for the applications discussed.