The influence of semantics in IR using LSI and K-means clustering techniques

Authors:
D. Jiménez;E. Ferretti;V. Vidal;P. Rosso;C. F. Enguix
Affiliations:
Polythecnic University of Valencia, Spain;National University of San Luis, Argentina;Polythecnic University of Valencia, Spain;Polythecnic University of Valencia, Spain;Mediterranean University of Science and Technology, Spain
Venue:
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Year:
2003

Citing 9
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Using latent semantic analysis to improve access to textual information

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
WordNet: a lexical database for English

Communications of the ACM
Understanding search engines: mathematical modeling and text retrieval

Understanding search engines: mathematical modeling and text retrieval
Modern Information Retrieval

Modern Information Retrieval
Computer Methods for Mathematical Computations

Computer Methods for Mathematical Computations
A Hidden Markov Model Approach to Word Sense Disambiguation

IBERAMIA 2002 Proceedings of the 8th Ibero-American Conference on AI: Advances in Artificial Intelligence
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Automatic noun sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study the influence of semantics in the information retrieval preprocessing. We concretely compare the reached performance with stemming and semantic lemmatization as preprocessing. Three techniques are used in the study: the direct use of a weighted matrix, the SVD technique in the LSI model and the bisecting spherical k-means clustering technique. although the results seem not to be very promising, we believe that they should be improved in the future.