Compression: A Key for Next-Generation Text Retrieval Systems

  • Authors:
  • Nivio Ziviani;Edleno Silva de Moura;Gonzalo Navarro;Ricardo Baeza-Yates

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Computer
  • Year:
  • 2000

Quantified Score

Hi-index 4.10

Visualization

Abstract

As online textual information explodes through the widespread use of digital libraries, office automation systems, document databases, and the Web, the need arises for an effective information retrieval (IR) system. The Web alone comprises approximately 800 million static pages, containing 6 trillion bytes of plain text--enough to store the text of a million books. Today's IR systems face the dynamic challenge of providing rapid and immediate access to this textual mass.Recent methods have demonstrated that directly searching compressed text is faster than searching original text and that flexible word searching improves the amount of compression obtained.Text compression focuses on finding ways to represent actual text in less space. This process involves replacing text symbols with equivalent symbols that use fewer bits or bytes. Text compression is attractive because it is cost efficient, requires less storage space, speeds up data transmittal, and reduces search time.The authors discuss the recent techniques that allow fast and direct searching of compressed text, and they explain how these techniques can improve the overall efficiency of IR systems.