Annotating Documents by Wikipedia Concepts

Authors:
Peter Schönhofen
Affiliations:
-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2008

Citing 15
Cited 0

Accurate Semantic Annotations via Pattern Matching

Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Knowledge Sources for Word Sense Disambiguation

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW

WWW '05 Proceedings of the 14th international conference on World Wide Web
Survey of semantic annotation platforms

Proceedings of the 2005 ACM symposium on Applied computing
Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discovering missing links in Wikipedia

Proceedings of the 3rd international workshop on Link discovery
Ontology based text indexing and querying for the semantic web

Knowledge-Based Systems
Semantic annotation, indexing, and retrieval

Web Semantics: Science, Services and Agents on the World Wide Web
Semantic annotation for knowledge management: Requirements and a survey of the state of the art

Web Semantics: Science, Services and Agents on the World Wide Web
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a technique which is able to reliably label words or phrases of an arbitrary document with Wikipedia articles (concepts) best describing their meaning. First it scans the document content, and when it finds a word sequence matching the title of a Wikipedia article, it attaches the article to the constituent word(s). The collected articles are then scored based on three factors: (1) how many other detected articles they semantically relate to, according to the Wikipedia link structure; (2) how specific is the concept they represent; and (3) how similar is the title by which they were detected to their "official" title. If a text location refers to multiple Wikipedia articles, only the one with the highest score is retained. Experiments on 24,000 randomly selected Wikipedia article bodies showed that 81% of phrases annotated by article authors were correctly identified. Moreover, out of the 5 concepts deemed as the most important by our algorithm during a final ranking, in average 72% was indeed marked in the original text.