Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Quantifying the accuracy of relational statements in Wikipedia: a methodology
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Learning to link with wikipedia
Proceedings of the 17th ACM conference on Information and knowledge management
NNexus: An Automatic Linker for Collaborative Web-Based Corpora
IEEE Transactions on Knowledge and Data Engineering
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Linkator: enriching web pages by automatically adding dereferenceable semantic annotations
ICWE'10 Proceedings of the 10th international conference on Web engineering
Topical and structural linkage in wikipedia
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Supporting creation of networked knowledge by automatically generated links
i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
The effects of navigation tools on the navigability of web-based information systems
i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
RDFa based annotation of web pages through keyphrases extraction
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Automatically embedding newsworthy links to articles
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hyperlinking can be viewed as two sub problems - link detection which determines the source of a link, and link disambiguation which determines the destination of a link. Wikipedia is a rich corpus with hyperlink data provided by authors. It is possible to use this data to train classifiers to be able to mimic the authors in some capacity. In this paper, we introduce automatic link detection as a sequence labeling problem. Conditional random fields (CRFs) are a probabilistic framework for labeling sequential data. We show that training a CRF with different types of features from the Wikipedia dataset can be used to automatically detect links with almost perfect precision and high recall.