Discovering word senses from text

Authors:
Patrick Pantel;Dekang Lin
Affiliations:
University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 11
Cited 150

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Performance standards and evaluations in IR test collections: cluster-based retrieval models

Information Processing and Management: an International Journal
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Data clustering: a review

ACM Computing Surveys (CSUR)
High performance question/answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Induction of semantic classes from natural language text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics

Mining the Web to Discover the Meanings of an Ambiguous Word

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Framework for Evaluating Knowledge-Based Interestingness of Association Rules

Fuzzy Optimization and Decision Making
Aligning database columns using mutual information

dg.o '05 Proceedings of the 2005 national conference on Digital government research
Significance information for translation: air quality data integration

dg.o '05 Proceedings of the 2005 national conference on Digital government research
Discovering corpus-specific word senses

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Discriminating among word senses using McQuitty's similarity analysis

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Word sense acquisition from bilingual comparable corpora

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Frequency estimates for statistical word similarity measures

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Corpus-based Learning of Analogies and Semantic Relations

Machine Learning
Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment

Journal of the American Society for Information Science and Technology
PhraseNet: towards context sensitive lexical semantics

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Clustering Syntactic Positions with Similar Semantic Requirements

Computational Linguistics
Similarity of Semantic Relations

Computational Linguistics
Learning question classifiers: the role of semantic information

Natural Language Engineering
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Inducing frame semantic verb classes from WordNet and LDOCE

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Learning word senses with feature selection and order identification capabilities

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A practical solution to the problem of automatic word sense induction

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Automatic clustering of collocation for detecting practical sense boundary

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
The distributional inclusion hypotheses and lexical entailment

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Inducing ontological co-occurrence vectors

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Efficient unsupervised discovery of word categories using symmetric patterns and high frequency words

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scaling distributional similarity to large corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Towards terascale knowledge acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic identification of infrequent word senses

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Discovering word senses from a network of lexical cooccurrences

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using Bayesian decision for ontology mapping

Web Semantics: Science, Services and Agents on the World Wide Web
Automatic Discovery of Concepts from Text

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Discriminating image senses by clustering with multimodal features

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic extraction of the multiple semantic and syntactic categories of words

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Semantic annotation of frequent patterns

ACM Transactions on Knowledge Discovery from Data (TKDD)
Ranking of field association terms using Co-word analysis

Information Processing and Management: an International Journal
Hidden sentiment association in chinese web opinion mining

Proceedings of the 17th international conference on World Wide Web
Automatic thesaurus construction

ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
Applications of corpus-based semantic similarity and word segmentation to database schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Pattern-based semantic class discovery with multi-membership support

Proceedings of the 17th ACM conference on Information and knowledge management
Tag-geotag correlation in social networks

Proceedings of the 2008 ACM workshop on Search in social media
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Learning semantic relatedness from term discrimination information

Expert Systems with Applications: An International Journal
An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Leveraging Sentiment Analysis for Topic Detection

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
MSDA: Wordsense Discrimination Using Context Vectors and Attributes

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Word Sense Induction Using Graphs of Collocations

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Multilingual word sense discrimination: a comparative cross-linguistic study

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Feature selection for automatic taxonomy induction

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Interactive feature space construction using semantic information

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Using hidden Markov random fields to combine distributional and pattern-based word clustering

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A joint information model for n-best ranking

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using three way data for word sense discrimination

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Exploring the sense distributions of homographs

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Bayesian word sense induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Translation and extension of concepts across languages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Evaluating the inferential utility of lexical-semantic resources

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Deriving generalized knowledge from corpora using WordNet abstraction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Two graph-based algorithms for state-of-the-art WSD

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Discriminative learning of selectional preference from unlabeled text

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semi-automatic entity set refinement

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Semeval-2007 task 02: evaluating word sense induction and discrimination systems

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Unsupervised concept discovery in Hebrew using simple unsupervised word prefix segmentation for Hebrew and Arabic

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
SemEval-2010 task 2: cross-lingual lexical substitution

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Unsupervised Word Sense Discrimination Improves Construction of the Wordnets

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Automatic knowledge representation using a graph-based algorithm for language-independent lexical chaining

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Graph connectivity measures for unsupervised parameter tuning of graph-based sense induction systems

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Helping editors choose better seed sets for entity set expansion

Proceedings of the 18th ACM conference on Information and knowledge management
Improving web page classification by label-propagation over click graphs

Proceedings of the 18th ACM conference on Information and knowledge management
Extracting opinions, opinion holders, and topics expressed in online news media text

SST '06 Proceedings of the Workshop on Sentiment and Subjectivity in Text
Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
A metric-based framework for automatic taxonomy induction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Employing topic models for pattern-based semantic class discovery

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatic Detection of Terminology Evolution

OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
Geo-mining: discovery of road and transport networks using directional patterns

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Graded word sense assignment

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Enhancement of lexical concepts using cross-lingual web mining

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Construction of disambiguated Folksonomy ontologies using Wikipedia

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
New experiments in distributional representations of synonymy

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Discriminative training of clustering functions: theory and experiments with entity identification

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Semantic lexicon adaptation for use in query interpretation

Proceedings of the 19th international conference on World wide web
Inducing classes of terms from text

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
On the evaluation of Korean wordnet

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Discovering word meanings based on frequent termsets

MCD'07 Proceedings of the 3rd ECML/PKDD international conference on Mining complex data
Deep lexical semantics

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Discovering word senses from text using random indexing

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Using word sense discrimination on historic document collections

Proceedings of the 10th annual joint conference on Digital libraries
Leveraging sentiment analysis for topic detection

Web Intelligence and Agent Systems
Multi-prototype vector-space models of word meaning

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Extracting glosses to disambiguate word senses

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A Bayesian method for robust estimation of distributional similarities

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bilingual sense similarity for statistical machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SemEval-2010 task 2: Cross-lingual lexical substitution

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
SemEval-2010 task 14: Word sense induction & disambiguation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
UoY: Graphs of unambiguous vertices for word sense induction and disambiguation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Long distance bigram models applied to word clustering

Pattern Recognition
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
What is word meaning, really?: (and how can distributional models help us describe it?)

GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Building re-usable dictionary repositories for real-world text mining

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Open entity extraction from web search query logs

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Corpus-based semantic class mining: distributional vs. pattern-based approaches

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Metaphor identification using verb and noun clustering

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Cause identification from aviation safety incident reports via weakly supervised semantic lexicon construction

Journal of Artificial Intelligence Research
The automatic identification of lexical variation between language varieties

Natural Language Engineering
Best topic word selection for topic labelling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Measuring similarity of word meaning in context with lexical substitutes and translations

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
A quantitative evaluation of global word sense induction

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Ontologizing concept maps using graph theory

Proceedings of the 2011 ACM Symposium on Applied Computing
Taxonomy induction based on a collaboratively built knowledge repository

Artificial Intelligence
Towards open ontology learning and filtering

Information Systems
Entity set expansion in opinion documents

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Jigs and lures: associating web queries with structured entities

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Which noun phrases denote which concepts?

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Latent semantic word sense induction and disambiguation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatically building training examples for entity extraction

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Nonparametric Bayesian word sense induction

TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
Two multivariate generalizations of pointwise mutual information

DiSCo '11 Proceedings of the Workshop on Distributional Semantics and Compositionality
Identifying collocations to measure compositionality: shared task system description

DiSCo '11 Proceedings of the Workshop on Distributional Semantics and Compositionality
Interactive sense feedback for difficult queries

Proceedings of the 20th ACM international conference on Information and knowledge management
Exploiting extremely rare features in text categorization

ECML'06 Proceedings of the 17th European conference on Machine Learning
Generating and evaluating triples for modelling a virtual environment

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
Learning semantic features for action recognition via diffusion maps

Computer Vision and Image Understanding
An approach to acquire semantic relationships between words from web document

ICWL'05 Proceedings of the 4th international conference on Advances in Web-Based Learning
Customisable semantic analysis of texts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Measuring the impact of sense similarity on word sense induction

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
A joint model for extended semantic role labeling

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Latent vector weighting for word meaning in context

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatically structuring domain knowledge from text: An overview of current research

Information Processing and Management: an International Journal
A quick tour of word sense disambiguation, induction and related approaches

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Fusion and inference from multiple data sources in a commensurate space

Statistical Analysis and Data Mining
Evaluation of clustering algorithms for word sense disambiguation

International Journal of Data Analysis Techniques and Strategies
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Computational Intelligence
Personalized resource categorisation in folksonomies

Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Self-adaptive semantic web service matching method

Knowledge-Based Systems
Regular polysemy: a distributional model

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Domain and function: a dual-space model of semantic relations and compositions

Journal of Artificial Intelligence Research
Mining entity types from query logs via user intent modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Ensemble semantics for large-scale unsupervised relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Classifying gene sentences in biomedical literature by combining high-precision gene identifiers

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Enabling direct interest-aware audience selection

Proceedings of the 21st ACM international conference on Information and knowledge management
A new clustering method for detecting rare senses of abbreviations in clinical notes

Journal of Biomedical Informatics
Generalized canonical correlation analysis for disparate data fusion

Pattern Recognition Letters
How tagging pragmatics influence tag sense discovery in social annotation systems

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
MaxMax: a graph-based soft clustering algorithm applied to word sense induction

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Statistical metaphor processing

Computational Linguistics
Extracting query facets from search results

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Learning concept hierarchies from textual resources for ontologies construction

Expert Systems with Applications: An International Journal
Towards mobile language evolution exploitation

Multimedia Tools and Applications
The cross-lingual lexical substitution task

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.