Statistical Identification of Key Phrases for Text Classification
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
An online document clustering technique for short web contents
Pattern Recognition Letters
Hi-index | 0.00 |
The World Wide Web has provided the facility of bringinginformation to the fingertips of its users. Since most of thedocument available on the web are machine-readablebut not machine-under tandable, ensuring the retrieval ofrelevant information continues to be a difficult task. In thetraditional text representation approach, high frequencykeywords are used as term representative of text.However, the main drawback in this approach are lackof direct relationship between word frequency and itimportance, and the effect of the word ambiguities.Considering these shortcomings of the keyword-basedmethod, this paper present a phrase-based textrepresentation approach that uses rule-based NaturalLanguage Processing (NLP) techniques. Extraction ofkey-phrases from text documents is based on a process ofpartial parsing. By making the indexing term moremeaningful through reduction of the ambiguity in wordconsidered in isolation, improvement in retrievaleffectivenes is sought to be achieved.