Extended information inference model for unsupervised categorization of web short texts

  • Authors:
  • Tao Xu;Qinke Peng

  • Affiliations:
  • ;

  • Venue:
  • Journal of Information Science
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional text-processing methods encounter significant performance degradation when they are applied to web short texts, with their inherent characteristics including feature sparseness, lack of sufficient hand-labelled training examples, domain dependence, and asyntactic expression. In this paper we propose a modified information inference model that can mimic human cognitive behaviour to categorize various web short texts in an unsupervised manner. The model is based on the conceptual space theory and hyperspace analogue to language (HAL) model, and it is a novel development in that it combines domain-specific knowledge and universal knowledge via a fusion mechanism for multiple HAL spaces. Moreover, in the realization of conceptual space, a concept is represented geometrically by a two-tuple of property sets, which can effectively improve the representation accuracy of the information contained in combined concepts. Two measurements of the relationship between concepts are used to implement the information inference for web short texts. The experimental evaluation of our model is conducted via three different tasks on web short text categorization, and the results indicate the applicability and usefulness of the proposed method.