Taxonomy generation for text segments: A practical web-based approach

Authors:
Shui-Lung Chuang;Lee-Feng Chien
Affiliations:
Institute of Information Science, Academia Sinica, Taipei, Taiwan;Institute of Information Science, Academia Sinica and Department of Information Management, National Taiwan University
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2005

Citing 27
Cited 14

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On integrating catalogs

Proceedings of the 10th international conference on World Wide Web
Finding topic words for hierarchical summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Inferring hierarchical descriptions

Proceedings of the eleventh international conference on Information and knowledge management
Enriching web taxonomies through subject categorization of query terms from search engine logs

Decision Support Systems - Web retrieval and mining
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Model-Based Hierarchical Clustering

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Topic hierarchy generation via linear discriminant projection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection

HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Visualized cognitive knowledge map integration for P2P networks

Decision Support Systems
Preserving User Preferences in Automated Document-Category Management: An Evolution-Based Approach

Journal of Management Information Systems
Detecting relationships among categories using text classification

Journal of the American Society for Information Science and Technology
An efficient similarity join algorithm with cosine similarity predicate

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Navigating within news collections using tag-flakes

Journal of Visual Languages and Computing
Multilingual document mining and navigation using self-organizing maps

Information Processing and Management: an International Journal
Towards fuzzy domain ontology based concept map generation for E-Learning

ICWL'07 Proceedings of the 6th international conference on Advances in web based learning
Automatically structuring domain knowledge from text: An overview of current research

Information Processing and Management: an International Journal
Category labelling for automatic classification scheme generation

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Conceptual modeling of cardinality constraints in social publishing

International Journal of Intelligent Systems
Domain taxonomy learning from text: The subsumption method versus hierarchical clustering

Data & Knowledge Engineering
Measuring similarity of windows applications using static and dynamic birthmarks

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Dimension independent similarity computation

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed taxonomy. In this article, we address the problem of taxonomy generation for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then designed for creating the hierarchical topic structure of text segments. Text segments with close concepts can be grouped together in a cluster, and relevant clusters linked at the same or near levels. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the algorithm tries to produce a more natural and comprehensive tree hierarchy. Extensive experiments were conducted on different domains of text segments, including subject terms, people names, paper titles, and natural language questions. The obtained experimental results have shown the potential of the proposed approach, which provides a basis for the in-depth analysis of text segments on a larger scale and is believed able to benefit many information systems.