Automatic Keyword Extraction Using Linguistic Features

Authors:
Xinghua Hu;Bin Wu
Affiliations:
University of California, Santa Cruz;University of California, Santa Cruz
Venue:
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Year:
2006

Citing 0
Cited 3

Classification and Summarization of Pros and Cons for Customer Reviews

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Word AdHoc Network: Using Google Core Distance to extract the most relevant information

Knowledge-Based Systems
Finding keywords in blogs: Efficient keyword extraction in blog mining via user behaviors

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel keyword extraction algorithm Position Weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including Term Frequency Inverse Term Frequency (TFITF), Position Weight Inverse Position Weight (PWIPW), and CHI-Square (梅2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively.