WordNet: a lexical database for English
Communications of the ACM
Modern Information Retrieval
Topic Identification in Dynamical Text by Complexity Pursuit
Neural Processing Letters
TopCat: Data Mining for Topic Identification in a Text Corpus
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Web Document Classification Based on Fuzzy Association
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Robust automated topic identification
Robust automated topic identification
Topic identification in discourse
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Knowledge-based automatic topic identification
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
ConceptNet — A Practical Commonsense Reasoning Tool-Kit
BT Technology Journal
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Discovering missing links in Wikipedia
Proceedings of the 3rd international workshop on Link discovery
Proceedings of the 15th international conference on World Wide Web
ACM SIGIR Forum
A Thesaurus Construction Method from Large ScaleWeb Dictionaries
AINA '07 Proceedings of the 21st International Conference on Advanced Networking and Applications
Web Intelligence and Agent Systems
Mining world knowledge for analysis of search engine content
Web Intelligence and Agent Systems
Identifying a hierarchy of bipartite subgraphs for web site abstraction
Web Intelligence and Agent Systems
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
WikiRelate! computing semantic relatedness using wikipedia
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
The problem of ontology alignment on the web: a first report
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Adapting recommender systems to the requirements of personal health record systems
Proceedings of the 1st ACM International Health Informatics Symposium
CATE: context-aware timeline for entity illustration
Proceedings of the 20th international conference companion on World wide web
UPS: efficient privacy protection in personalized web search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Hi-index | 0.00 |
In the last few years the size and coverage of Wikipedia, a community edited, freely available on-line encyclopedia has reached the point where it can be effectively used to identify topics discussed in a document, similarly to an ontology or taxonomy. In this paper we will show that even a fairly simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and also by performing classification and clustering on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of (or in addition to) their texts. Support from NKFP projects MOLINGV and Language Miner.