Phrase-based Text Representation for Managing the Web Documents

  • Authors:
  • Rupali Sharma;S. Raman

  • Affiliations:
  • -;-

  • Venue:
  • ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web has provided the facility of bringinginformation to the fingertips of its users. Since most of thedocument available on the web are machine-readablebut not machine-under tandable, ensuring the retrieval ofrelevant information continues to be a difficult task. In thetraditional text representation approach, high frequencykeywords are used as term representative of text.However, the main drawback in this approach are lackof direct relationship between word frequency and itimportance, and the effect of the word ambiguities.Considering these shortcomings of the keyword-basedmethod, this paper present a phrase-based textrepresentation approach that uses rule-based NaturalLanguage Processing (NLP) techniques. Extraction ofkey-phrases from text documents is based on a process ofpartial parsing. By making the indexing term moremeaningful through reduction of the ambiguity in wordconsidered in isolation, improvement in retrievaleffectivenes is sought to be achieved.