The Journal of Machine Learning Research
Topic modeling: beyond bag-of-words
ICML '06 Proceedings of the 23rd international conference on Machine learning
Evaluating topic models for digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
A new semantics: merging propositional and distributional information
IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Hi-index | 0.00 |
Topic models are emerging tools for improved browsing and searching within digital libraries. These techniques collapse words within documents into unordered "bags of words," ignoring word order. In this paper, we present a method that examines syntactic dependency parse trees from Wikipedia article titles to learn expected patterns between relative lexical arguments. This process is highly dependent on the global word ordering of a sentence, modeling how each word interacts with other words to gain an aggregate perspective on how words interact over all 3.2 million titles. Using this information, we analyze how coherent a given topic is by comparing the relative usage vectors between the top 5 words in a topic. Results suggest that this technique can identify poor topics based on how well the relative usages align with each other within a topic, potentially aiding digital library indexing.