Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval
Introduction to Information Retrieval
Web-scale named entity recognition
Proceedings of the 17th ACM conference on Information and knowledge management
Learning to link with wikipedia
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the Second ACM International Conference on Web Search and Data Mining
A survey of Web clustering engines
ACM Computing Surveys (CSUR)
Collective annotation of Wikipedia entities in web text
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improving quality of search results clustering with approximate matrix factorisations
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
The typical IR-approach to indexing, clustering, classification and retrieval, just to name a few, is the one based on the bag-of-words paradigm. It eventually transforms a text into an array of terms, possibly weighted (with tf-idf scores or derivatives), and then represents that array via points in highly-dimensional space. It is therefore syntactical and unstructured, in the sense that different terms lead to different dimensions. Co-occurrence detection and other processing steps have been thus proposed (see e.g. LSI, Spectral analysis [7]) to identify the existence of those relations, but yet everyone is aware of the limitations of this approach especially in the expanding context of short (and thus poorly composed) texts, such as the snippets of search-engine results, the tweets of a Twitter channel, the items of a news feed, the posts of a blog, or the advertisement messages, etc..