Algorithms for clustering data
Algorithms for clustering data
Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic hypertext link typing
Proceedings of the the seventh ACM conference on Hypertext
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
On the merits of building categorization systems by supervised clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Deriving concept hierarchies from text
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Proceedings of the 10th international conference on World Wide Web
Unsupervised and supervised clustering for topic tracking
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Finding topic words for hierarchical summarization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Learning to map between ontologies on the semantic web
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Information navigation on the web by clustering and summarizing query results
Information Processing and Management: an International Journal
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Topic hierarchy generation via linear discriminant projection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Using Discriminant Analysis for Multi-class Classification
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Bootstrapping for hierarchical document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Efficient multi-way text categorization via generalized discriminant analysis
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Web taxonomy integration using support vector machines
Proceedings of the 13th international conference on World Wide Web
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Document classification through interactive supervision of document and term labels
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Taxonomies by the numbers: building high-performance taxonomies
Proceedings of the 14th ACM international conference on Information and knowledge management
Fast and accurate text classification via multiple linear discriminant projections
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Generalizing discriminant analysis using the generalized singular value decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Labeling design documents based on operators' consensus-A case study of robotic design
Computers in Industry
Hi-index | 0.00 |
Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this paper, we propose an automatic method of collecting training samples to build hierarchical taxonomies. In our method, the category node is initially defined by some keywords, the web search engine is then used to construct a small set of labeled documents, and a topic tracking algorithm with keyword-based content normalization is applied to enlarge the training corpus on the basis of the seed documents. We also design a method to check the consistency of the collected corpus. The above steps produce a flat category structure which contains all the categories for building the hierarchical taxonomy. Next, linear discriminant projection approach is utilized to construct more meaningful intermediate levels of hierarchies in the generated flat set of categories. Experimental results show that the training corpus is good enough for statistical classification methods.