A self-learning universal concept spotter

Authors:
Tomek Strzalkowski;Jin Wang
Affiliations:
GE Corporate Research and Development, Schenectady, NY;GE Corporate Research and Development, Schenectady, NY
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 5
Cited 15

Studies in part of speech labelling

HLT '91 Proceedings of the workshop on Speech and Natural Language
Natural language information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics

Revision of Morphological Analysis Errors through the Person Name Construction Model

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Information extraction for enhanced access to disease outbreak reports

Journal of Biomedical Informatics - Special issue: Sublanguage
Summarization-based query expansion in information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Experiments in automated lexicon building for text searching

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Unsupervised learning of generalized names

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Counter-training in discovery of semantic patterns

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A text-extraction based summarizer

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Named entity discovery using comparable news articles

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
HITIQA: towards analytical question answering

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Hitiqa: High-quality intelligence through interactive question answering

Natural Language Engineering
Natural Language Processing as a Foundation of the Semantic Web

Foundations and Trends in Web Science
HITIQA: a data driven approach to interactive analytical question answering

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Data selection in semi-supervised learning for name tagging

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Self-adjusting bootstrapping

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Bootstrapping events and relations from text

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.01

Visualization

Abstract

We describe the Universal Spotter, a system for identifying in-text references to entities of an arbitrary, user-specified type, such as people, organizations, equipment, products, materials, etc. Starting with some initial seed examples, and a training text corpus, the system generates rules that will find further concepts of the same type. The initial seed information is provided by the user in the form of a typical lexical context in which the entities to be spotted occur, e.g., "the name ends with Co.", or "to the right of produced or made", and so forth, or by simply supplying examples of the concept itself, e.g., Ford Taurus, gas turbine, Big Mac. In addition, negative examples can be supplied, if known. Given a sufficiently large training corpus, an unsupervised learning process is initiated in which the system will: (1) find instances of the sought-after concept using the seed-context information while maximizing recall and precision; (2) find additional contexts in which these entities occur; and (3) expand the initial seed-context with selected new contexts to find even more entities. Preliminary results of creating spotters for organizations and products are discussed.