Exploiting noun phrases and semantic relationships for text document clustering

Authors:
Hai-Tao Zheng;Bo-Yeong Kang;Hong-Gee Kim
Affiliations:
Biomedical Knowledge Engineering Laboratory, BK21 College of Dentistry, Seoul National University, 28 Yeongeon-dong, Jongro-gu, Seoul 110-810, Republic of Korea;Biomedical Knowledge Engineering Laboratory, BK21 College of Dentistry, Seoul National University, 28 Yeongeon-dong, Jongro-gu, Seoul 110-810, Republic of Korea;Biomedical Knowledge Engineering Laboratory, BK21 College of Dentistry, Seoul National University, 28 Yeongeon-dong, Jongro-gu, Seoul 110-810, Republic of Korea
Venue:
Information Sciences: an International Journal
Year:
2009

Citing 40
Cited 18

Algorithms for clustering data

Algorithms for clustering data
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Using latent semantic indexing for information filtering

COCS '90 Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems
WordNet: a lexical database for English

Communications of the ACM
Using linear algebra for intelligent information retrieval

SIAM Review
An algorithm for suffix stripping

Readings in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
A vector space model for automatic indexing

Communications of the ACM
Information Retrieval

Information Retrieval
Document clustering with committees

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Building Hypertext Links By Computing Semantic Similarity

IEEE Transactions on Knowledge and Data Engineering
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
Partial parsing via finite-state cascades

Natural Language Engineering
Hybrid Neural Document Clustering Using Guided Self-Organization and WordNet

IEEE Intelligent Systems
Constraint grammar as a framework for parsing running text

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
A reference ontology for biomedical informatics: the foundational model of anatomy

Journal of Biomedical Informatics - Special issue: Unified medical language system
Document Similarity Using a Phrase Indexing Graph Model

Knowledge and Information Systems
Using Ontology in Hierarchical Information Clustering

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Exploiting concept clusters for content-based information retrieval

Information Sciences—Informatics and Computer Science: An International Journal
Exploration of textual document archives using a fuzzy hierarchical clustering algorithm in the GAMBAL system

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Gene-Ontology-based clustering of gene expression data

Bioinformatics
Neural Network Based Document Clustering Using WordNet Ontologies

International Journal of Hybrid Intelligent Systems
Ontology Based Clustering for Improving Genomic IR

CBMS '07 Proceedings of the Twentieth IEEE International Symposium on Computer-Based Medical Systems
A novel document similarity measure based on earth mover's distance

Information Sciences: an International Journal
GAKREM: A novel hybrid clustering algorithm

Information Sciences: an International Journal
Clustering high dimensional data: A graph-based relaxed optimization approach

Information Sciences: an International Journal
Accelerating fuzzy clustering

Information Sciences: an International Journal
WordNet-based text document clustering

ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
An ontology-based two-level clustering for supporting e-commerce agents' activities

EC-Web'05 Proceedings of the 6th international conference on E-Commerce and Web Technologies
Ontology-based users and requests clustering in customer service management system

AIS-ADM 2005 Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining
Phrase clustering without document context

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
An alternative approach to tagging

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

CorpWiki: A self-regulating wiki to promote corporate collective intelligence through expert peer matching

Information Sciences: an International Journal
WisColl: Collective wisdom based blog clustering

Information Sciences: an International Journal
GOClonto: An ontological clustering approach for conceptualizing PubMed abstracts

Journal of Biomedical Informatics
Document similarity: a new measure using OWA

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
A short text modeling method combining semantic and statistical information

Information Sciences: an International Journal
Validation of overlapping clustering: A random clustering perspective

Information Sciences: an International Journal
User comments for news recommendation in forum-based social media

Information Sciences: an International Journal
Ensemble of feature sets and classification algorithms for sentiment classification

Information Sciences: an International Journal
Concept-based learning of human behavior for customer relationship management

Information Sciences: an International Journal
Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization

Information Sciences: an International Journal
A time-varying propagation model of hot topic on BBS sites and Blog networks

Information Sciences: an International Journal
Exploring barriers to knowledge flow at different knowledge management maturity stages

Information and Management
Collapse and reorganization patterns of social knowledge representation in evolving semantic networks

Information Sciences: an International Journal
Concept chaining utilizing meronyms in text characterization

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
An ontology-based approach to Chinese semantic advertising

Information Sciences: an International Journal
Text Semantic Mining Model Based on the Algebra of Human Concept Learning

International Journal of Cognitive Informatics and Natural Intelligence
Emergent self organizing maps for text cluster visualization by incorporating ontology based descriptors

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
Semantic smoothing for text clustering

Knowledge-Based Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

Text document clustering plays an important role in providing better document retrieval, document browsing, and text mining. Traditionally, clustering techniques do not consider the semantic relationships between words, such as synonymy and hypernymy. To exploit semantic relationships, ontologies such as WordNet have been used to improve clustering results. However, WordNet-based clustering methods mostly rely on single-term analysis of text; they do not perform any phrase-based analysis. In addition, these methods utilize synonymy to identify concepts and only explore hypernymy to calculate concept frequencies, without considering other semantic relationships such as hyponymy. To address these issues, we combine detection of noun phrases with the use of WordNet as background knowledge to explore better ways of representing documents semantically for clustering. First, based on noun phrases as well as single-term analysis, we exploit different document representation methods to analyze the effectiveness of hypernymy, hyponymy, holonymy, and meronymy. Second, we choose the most effective method and compare it with the WordNet-based clustering method proposed by others. The experimental results show the effectiveness of semantic relationships for clustering are (from highest to lowest): hypernymy, hyponymy, meronymy, and holonymy. Moreover, we found that noun phrase analysis improves the WordNet-based clustering method.