Combining Text Vector Representations for Information Retrieval

Authors:
Maya Carrillo;Chris Eliasmith;A. López-López
Affiliations:
Coordinación de Ciencias Computacionales, INAOE, Puebla, Mexico 72840 and Facultad de Ciencias de la Computación, BUAP, Puebla, Mexico 72570;Department of Philosophy, Department of Systems Design Engineering, Centre for Theoretical Neuroscience, University of Waterloo, Waterloo, Canada;Coordinación de Ciencias Computacionales, INAOE, Puebla, Mexico 72840
Venue:
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Year:
2009

Citing 4
Cited 1

A vector space model for automatic indexing

Communications of the ACM
Holographic Reduced Representation: Distributed Representation for Cognitive Structures

Holographic Reduced Representation: Distributed Representation for Cognitive Structures
Using bag-of-concepts to improve the performance of support vector machines in text categorization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Integrating structure and meaning: a new method for encoding structure for text classification

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Multimodal indexing based on semantic cohesion for image retrieval

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper suggests a novel representation for documents that is intended to improve precision. This representation is generated by combining two central techniques: Random Indexing; and Holographic Reduced Representations (HRRs). Random indexing uses co-occurrence information among words to generate semantic context vectors that are the sum of randomly generated term identity vectors. HRRs are used to encode textual structure which can directly capture relations between words (e.g., compound terms, subject-verb, and verb-object). By using the random vectors to capture semantic information, and then employing HRRs to capture structural relations extracted from the text, document vectors are generated by summing all such representations in a document. In this paper, we show that these representations can be successfully used in information retrieval, can effectively incorporate relations, and can reduce the dimensionality of the traditional vector space model (VSM). The results of our experiments show that, when a representation that uses random index vectors is combined with different contexts, such as document occurrence representation (DOR), term co-occurrence representation (TCOR) and HRRs, the VSM representation is outperformed when employed in information retrieval tasks.