New Techniques for Disambiguation in Natural Language and Their Application to Biological Text

Authors:
Filip Ginter;Jorma Boberg;Jouni Järvinen;Tapio Salakoski
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2004

Citing 8
Cited 12

On ordered weighted averaging aggregation operators in multicriteria decisionmaking

IEEE Transactions on Systems, Man and Cybernetics
C4.5: programs for machine learning

C4.5: programs for machine learning
The weighted majority algorithm

Information and Computation
On the inclusion of importances in OWA aggregations

The ordered weighted averaging operators
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Learning trees and rules with set-valued features

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

A Network Analysis Model for Disambiguation of Names in Lists

Computational & Mathematical Organization Theory
Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering

International Journal of Data Mining and Bioinformatics
IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering

International Journal of Data Mining and Bioinformatics
Matrix representations, linear transformations, and kernels for disambiguation in natural language

Machine Learning
A text-mining technique for extracting gene-disease associations from the biomedical literature

International Journal of Bioinformatics Research and Applications
Author Name Disambiguation in Citations

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
An unsupervised language independent method of name discrimination using second order co-occurrence features

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Unsupervised name ambiguity resolution using a generative model

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Semantic annotation of biomedical literature using google

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part III
IdentityRank: Named entity disambiguation in the news domain

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

We study the problems of disambiguation in natural language, focusing on the problem of gene vs. protein name disambiguation in biological text and also considering the problem of context-sensitive spelling error correction. We introduce a new family of classifiers based on ordering and weighting the feature vectors obtained from word counts and word co-occurrence in the text, and inspect several concrete classifiers from this family. We obtain the most accurate prediction when weighting by positions of the words in the context. On the gene/protein name disambiguation problem, this classifier outperforms both the Naive Bayes and SNoW baseline classifiers. We also study the effect of the smoothing techniques with the Naive Bayes classifier, the collocation features, and the context length on the classification accuracy and show that correct setting of the context length is important and also problem-dependent.