Semantic annotation of biosystematics literature without training examples

Authors:
Hong Cui;David Boufford;Paul Selden
Affiliations:
School of Information Resources and Library Science, University of Arizona, 1515 E. First Street, Tucson AZ 85719;Harvard University Herbaria, Harvard University, 22 Divinity Avenue, Cambridge, MA 02138;Paleontological Institute, University of Kansas, 1475 Jayhawk Boulevard, Lawrence, KS 66045
Venue:
Journal of the American Society for Information Science and Technology
Year:
2010

Citing 7
Cited 4

Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
An approach to automatic classification of text for information retrieval

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
The reusability of induced knowledge for the automatic semantic markup of taxonomic descriptions

Journal of the American Society for Information Science and Technology
Unsupervised semantic markup of literature for biodiversity digital libraries

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
An Unsupervised Approach to Product Attribute Extraction

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Semi-supervised learning of attribute-value pairs from product descriptions

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Tools for semantic annotation of taxonomic descriptions

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Unsupervised extraction of text segments from heterogeneous document collections

Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
From text to RDF triple store: an application for biodiversity literature

Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
CharaParser for fine-grained semantic annotation of organism morphological descriptions

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces-eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated. © 2010 Wiley Periodicals, Inc.