Concept discovery from text

Authors:
Dekang Lin;Patrick Pantel
Affiliations:
University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 11
Cited 60

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Data clustering: a review

ACM Computing Surveys (CSUR)
Induction of semantic classes from natural language text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
PRINCIPAR: an efficient, broad-coverage, principle-based parser

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Distributional similarity models: clustering vs. nearest neighbors

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

A Framework for Evaluating Knowledge-Based Interestingness of Association Rules

Fuzzy Optimization and Decision Making
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A hierarchical naive Bayes mixture model for name disambiguation in author citations

Proceedings of the 2005 ACM symposium on Applied computing
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Ontologizing semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic extraction of paraphrastic phrases from medium size corpora

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A generalized framework for revealing analogous themes across related topics

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The role of documents vs. queries in extracting class attributes from text

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Collective knowledge systems: Where the Social Web meets the Semantic Web

Web Semantics: Science, Services and Agents on the World Wide Web
Semantic enrichment of places: Ontology learning from web

International Journal of Knowledge-based and Intelligent Engineering Systems - Intelligent agents and services for smart environments
Bootstrapped extraction of class attributes

Proceedings of the 18th international conference on World wide web
Low-Cost Supervision for Multiple-Source Attribute Extraction

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Extractive summaries for educational science content

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Rank-Based Transformation in Measuring Semantic Relatedness

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Automatically Harvesting and Ontologizing Semantic Relations

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Extracting Relations towards Ontology Extension

KES-AMSTA '09 Proceedings of the Third KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The "close-distant" relation of adjectival concepts based on self-organizing map

COGALEX '08 Proceedings of the workshop on Cognitive Aspects of the Lexicon
Pedagogically useful extractive summaries for science education

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using hidden Markov random fields to combine distributional and pattern-based word clustering

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Outclassing Wikipedia in open-domain information extraction: weakly-supervised acquisition of attributes over conceptual hierarchies

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Weakly-supervised acquisition of labeled class instances using graph random walks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Disambiguating Tags in Blogs

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Finding cars, goddesses and enzymes: parametrizable acquisition of labeled instances for open-domain information extraction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Optimizing classifier performance in word sense disambiguation by redefining word sense classes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
What you seek is what you get: extraction of class attributes from query logs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic fine-grained semantic classification for domain adaptation

STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Corpus-based semantic lexicon induction with Web-based corroboration

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Semi-supervised learning of semantic classes for query understanding: from the web and for the web

Proceedings of the 18th ACM conference on Information and knowledge management
Latent variable models of concept-attribute attachment

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Towards semi-automatic extraction of lexical semantics relations for Polish

International Journal of Intelligent Information and Database Systems
Acquisition of instance attributes via labeled and related instances

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Multi-prototype vector-space models of word meaning

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning arguments and supertypes of semantic relations using recursive patterns

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Evaluation of commonsense knowledge with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Towards the automatic creation of a wordnet from a term-based lexical network

TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Learning first-order Horn clauses from web text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A semi-supervised method to learn and construct taxonomies using the web

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Enriching the adjective domain in the Japanese wordnet

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Ontology population and enrichment: state of the art

Knowledge-driven multimedia information extraction and ontology evolution
Which noun phrases denote which concepts?

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Insights from network structure for text mining

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised relation extraction using dependency trees for automatic generation of multiple-choice questions

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Automatically enriching a thesaurus with information from dictionaries

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Asking what no one has asked before: using phrase similarities to generate synthetic web search queries

Proceedings of the 20th ACM international conference on Information and knowledge management
Acquiring concept hierarchies of adjectives from corpora

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
How effective is query expansion for finding novel information?

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
WebSets: extracting sets of entities from the web using unsupervised information extraction

Proceedings of the fifth ACM international conference on Web search and data mining
Sequence clustering and labeling for unsupervised query intent discovery

Proceedings of the fifth ACM international conference on Web search and data mining
Customisable semantic analysis of texts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Class label enhancement via related instances

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fusion and inference from multiple data sources in a commensurate space

Statistical Analysis and Data Mining
Automatic discovery of fuzzy synsets from dictionary definitions

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Generalized canonical correlation analysis for disparate data fusion

Pattern Recognition Letters
Incorporating lexical semantic similarity to tree kernel-based chinese relation extraction

CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Evaluating Word Sense Induction and Disambiguation Methods

Language Resources and Evaluation
Tailoring the automated construction of large-scale taxonomies using the web

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Broad-coverage lexical resources such as WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning elements to their most similar cluster. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and classes extracted from WordNet (the answer key). Our experiments show that CBC outperforms several well-known clustering algorithms in cluster quality.