Query expansion using an immune-inspired biclustering algorithm

  • Authors:
  • Pablo A. Castro;Fabrício O. França;Hamilton M. Ferreira;Guilherme Palermo Coelho;Fernando J. Zuben

  • Affiliations:
  • Laboratory of Bioinformatics and Bio-inspired Computing (LBiC), Department of Computer Engineering and Industrial Automation (DCA), School of Electrical and Computer Engineering (FEEC), University ...;Laboratory of Bioinformatics and Bio-inspired Computing (LBiC), Department of Computer Engineering and Industrial Automation (DCA), School of Electrical and Computer Engineering (FEEC), University ...;Laboratory of Bioinformatics and Bio-inspired Computing (LBiC), Department of Computer Engineering and Industrial Automation (DCA), School of Electrical and Computer Engineering (FEEC), University ...;Laboratory of Bioinformatics and Bio-inspired Computing (LBiC), Department of Computer Engineering and Industrial Automation (DCA), School of Electrical and Computer Engineering (FEEC), University ...;Laboratory of Bioinformatics and Bio-inspired Computing (LBiC), Department of Computer Engineering and Industrial Automation (DCA), School of Electrical and Computer Engineering (FEEC), University ...

  • Venue:
  • Natural Computing: an international journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query expansion is a technique utilized to improve the performance of information retrieval systems by automatically adding related terms to the initial query. These additional terms can be obtained from documents stored in a database. Usually, this task is performed by clustering the documents and then extracting representative terms from the clusters. Afterwards, a new search is performed in the whole database using the expanded set of terms. Recently, the authors have proposed an immune-inspired algorithm, namely BIC-aiNet, to perform biclustering of texts. Biclustering differs from standard clustering algorithms in the sense that the former can detect partial similarities in the attributes. The preliminary results indicated that our proposal is able to group similar texts effectively and the generated biclusters consistently presented relevant words to represent a category of texts. Motivated by this promising scenario, this paper better formalizes the proposal and investigates the usefulness of the whole methodology on larger datasets. The BIC-aiNet was applied to a set of documents aiming at identifying the set of relevant terms associated with each bicluster, giving rise to a query expansion tool. The obtained results were compared with those produced by two alternative proposals in the literature, and they indicate that these techniques tend to generate complementary results, as a consequence of the use of distinct similarity metrics.