Graph-based word clustering using a web search engine

Authors:
Yutaka Matsuo;Takeshi Sakaki;Kôki Uchiyama;Mitsuru Ishizuka
Affiliations:
National Institute of Advanced, Industrial Science and Technology, Sotokanda, Tokyo;University of Tokyo, Hongo, Tokyo;Hottolink Inc., Nishi-gotanda, Tokyo;University of Tokyo, Hongo, Tokyo
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 19
Cited 22

Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
Word clustering and disambiguation based on co-occurrence data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Automatic thesaurus generation through multiple filtering

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A search engine for natural language applications

WWW '05 Proceedings of the 14th international conference on World Wide Web
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
A graph model for unsupervised lexical acquisition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Frequency estimates for statistical word similarity measures

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Ensemble methods for automatic thesaurus extraction

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Using the web to overcome data sparseness

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
POLYPHONET: an advanced social network extraction system from the web

Proceedings of the 15th international conference on World Wide Web
Creating multilingual translation lexicons with regional variations using web corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Flink: Semantic Web technology for the extraction and analysis of social networks

Web Semantics: Science, Services and Agents on the World Wide Web
Categorizing unknown text segments for information extraction using a search result mining approach

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Towards a Novel Association Measure via Web Search Results Mining

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Valuable Change Detection in Keyword Map Animation

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Using hidden Markov random fields to combine distributional and pattern-based word clustering

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Graph-based clustering for semantic classification of onomatopoetic words

TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Towards Bridging the Web and the Semantic Web

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Classifying Japanese polysemous verbs based on fuzzy C-means clustering

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Research paper title evaluation for reaching new audiences

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Automated skimming in response to questions for nonvisual readers

SLPAT '10 Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies
Graph-based clustering for computational linguistics: a survey

TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Thesaurus extension using web search engines

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
MorphoNet: exploring the use of community structure for unsupervised morpheme analysis

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Reliability verification of search engines' hit counts: how to select a reliable hit count for a query

ICWE'10 Proceedings of the 10th international conference on Current trends in web engineering
Clustering product features for opinion mining

Proceedings of the fourth ACM international conference on Web search and data mining
Polysemous verb classification using subcategorization acquisition and graph-based clustering

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Induction of Semantic Classes Based on Coordinate Patterns

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

Knowledge-Based Systems
Harnessing different knowledge sources to measure semantic relatedness under a uniform model

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hierarchical verb clustering using graph factorization

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A framework for semantic discovery of web services

iUBICOM'10 Proceedings of the 5th international conference on Ubiquitous and Collaborative Computing
Hybrid Method for Computing Word-Pair Similarity based on Web Content

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Context similarity measure using Fuzzy Formal Concept Analysis

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word co-occurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a web directory and WordNet.