Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
ACM SIGIR Forum
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Text Mining: Predictive Methods for Analyzing Unstructured Information
Text Mining: Predictive Methods for Analyzing Unstructured Information
Introduction to the CoNLL-2000 shared task: chunking
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Extending the single words-based document model: a comparison of bigrams and 2-itemsets
Proceedings of the 2006 ACM symposium on Document engineering
Introduction to Information Retrieval
Introduction to Information Retrieval
An Experiment in Automatic Classification of Pathological Reports
AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
Automatic Identification of Stop Words in Chinese Text Classification
CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 01
Applying collocation segmentation to the ACL anthology reference corpus
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Hi-index | 0.00 |
Automatic document annotation from a controlled conceptual thesaurus is useful for establishing precise links between similar documents. This study presents a language independent document annotation system based on features derived from a novel collocation segmentation method. Using the multilingual conceptual thesaurus EuroVoc, we evaluate filtered and unfiltered version of the method, comparing it against other language independent methods based on single words and bigrams. Testing our new method against the manually tagged multilingual corpus Acquis Communautaire 3.0 (AC) using all descriptors found there, we attain improvements in keyword assignment precision from 18 to 29 percent and in F-measure from 17.2 to 27.6 for 5 keywords assigned to a document. The further filtering out of the top 10 frequent items improves precision by 4 percent and collocation segmentation improves precision by 9 percent on the average, over 21 languages tested.