Word association norms, mutual information, and lexicography
Computational Linguistics
Approaches to passage retrieval in full text information systems
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
CorePhrase: keyphrase extraction for document clustering
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.01 |
This paper presents an algorithm for extraction of phrases from text documents. The algorithm builds phrases by iteratively merging bigrams according to an association measure.Tw o association measures are presented: mutual information and t-test. The extracted phrases are tested in a document classification task using a tf/idf model and a k-nearest neighbor classifier.