Phrase-based Text Representation for Managing the Web Documents

Authors:
Rupali Sharma;S. Raman
Affiliations:
-;-
Venue:
ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
Year:
2003

Citing 0
Cited 2

Statistical Identification of Key Phrases for Text Classification

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
An online document clustering technique for short web contents

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The World Wide Web has provided the facility of bringinginformation to the fingertips of its users. Since most of thedocument available on the web are machine-readablebut not machine-under tandable, ensuring the retrieval ofrelevant information continues to be a difficult task. In thetraditional text representation approach, high frequencykeywords are used as term representative of text.However, the main drawback in this approach are lackof direct relationship between word frequency and itimportance, and the effect of the word ambiguities.Considering these shortcomings of the keyword-basedmethod, this paper present a phrase-based textrepresentation approach that uses rule-based NaturalLanguage Processing (NLP) techniques. Extraction ofkey-phrases from text documents is based on a process ofpartial parsing. By making the indexing term moremeaningful through reduction of the ambiguity in wordconsidered in isolation, improvement in retrievaleffectivenes is sought to be achieved.