Inferring hierarchical descriptions

Authors:
Eric Glover;David M. Pennock;Steve Lawrence;Robert Krovetz
Affiliations:
NEC Research Institute, Princeton, NJ;NEC Research Institute, Princeton, NJ;NEC Research Institute, Princeton, NJ;NEC Research Institute, Princeton, NJ
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 7
Cited 25

Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Statistical Models for Co-occurrence Data

Statistical Models for Co-occurrence Data
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic summarization of search engine hit lists

RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11

PageCluster: Mining conceptual link hierarchies from Web log files for adaptive Web site navigation

ACM Transactions on Internet Technology (TOIT)
A practical web-based approach to generating topic hierarchy for text segments

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Taxonomy generation for text segments: A practical web-based approach

ACM Transactions on Information Systems (TOIS)
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
An experimental study on automatically labeling hierarchical clusters using statistical features

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Ontology learning: state of the art and open issues

Information Technology and Management
User Oriented Hierarchical Information Organization and Retrieval

ECML '07 Proceedings of the 18th European conference on Machine Learning
Collection Browsing through Automatic Hierarchical Tagging

AH '08 Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Query based optimal web site clustering using simulated annealing

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Real time extraction of related terms by bi-directional lexico-syntactic patterns from the web

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Document Clustering Description Extraction and Its Application

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Heuristic-Based Approach for Constructing Hierarchical Knowledge Structures

IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
A Genre-Aware Approach to Focused Crawling

World Wide Web
Analysis of structural relationships for hierarchical cluster labeling

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Constructing tree-based knowledge structures from text corpus

Applied Intelligence
Selecting candidate labels for hierarchical document clusters using association rules

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Word clouds of multiple search results

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Principal components for automatic term hierarchy building

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Extracting semantic relationships between terms from PC documents and its applications to web search personalization

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Dynamic pattern mining: an incremental data clustering approach

Journal on Data Semantics II
A web-based novel term similarity framework for ontology learning

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Discovering a term taxonomy from term similarities using principal component analysis

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Cluster labeling for multilingual scatter/gather using comparable corpora

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Exploring the existing category hierarchy to automatically label the newly-arising topics in cQA

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We create a statistical model for inferring hierarchical term relationships about a topic, given only a small set of example web pages on the topic, without prior knowledge of any hierarchical information. The model can utilize either the full text of the pages in the cluster or the context of links to the pages. To support the model, we use "ground truth" data taken from the category labels in the Open Directory. We show that the model accurately separates terms in the following classes: self terms describing the cluster, parent terms describing more general concepts, and child terms describing specializations of the cluster. For example, for a set of biology pages, sample parent, self, and child terms are science, biology, and genetics respectively. We create an algorithm to predict parent, self, and child terms using the new model, and compare the predictions to the ground truth data. The algorithm accurately ranks a majority of the ground truth terms highly, and identifies additional complementary terms missing in the Open Directory.