Algorithms for clustering data
Algorithms for clustering data
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification without Negative Examples Revisit
IEEE Transactions on Knowledge and Data Engineering
Text clustering with extended user feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Constructing informative prior distributions from domain knowledge in text classification
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Reducing the human overhead in text categorization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Investigating unsupervised learning for text categorization bootstrapping
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Efficient bayesian hierarchical user modeling for recommendation system
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating rich user feedback into intelligent user interfaces
Proceedings of the 13th international conference on Intelligent user interfaces
Learning from labeled features using generalized expectation criteria
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Non-negative matrix factorization for semi-supervised data clustering
Knowledge and Information Systems
Keyword-Labeled Classification with Auxiliary Unlabeled Documents
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Knowledge Supervised Text Classification with No Labeled Documents
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Directly Identify Unexpected Instances in the Test Set by Entropy Maximization
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Building a Text Classifier by a Keyword and Unlabeled Documents
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Interacting meaningfully with machine learning systems: Three experiments
International Journal of Human-Computer Studies
Sentiment analysis of blogs by combining lexical knowledge with text classification
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Active dual supervision: reducing the cost of annotating examples and features
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Towards modeling threaded discussions using induced ontology knowledge
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Building a Text Classifier by a Keyword and Wikipedia Knowledge
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Text categorization from category name via lexical reference
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Interactive clustering of text collections according to a user-specified criterion
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Extracting lexical reference rules from Wikipedia
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Fully Automatic Text Categorization by Exploiting WordNet
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Mining chat conversations for sex identification
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Domain adaptation for conditional random fields
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Towards subjectifying text clustering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Expert Systems with Applications: An International Journal
Voice of the customers: mining online customer reviews for product feature-based ranking
WOSN'10 Proceedings of the 3rd conference on Online social networks
A unified approach to active dual supervision for labeling features and examples
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Directional distributional similarity for lexical inference
Natural Language Engineering
Generating an event arrangement for understanding news articles on the web
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Which clustering do you want? inducing your ideal clustering with minimal feedback
Journal of Artificial Intelligence Research
Interactive feature selection for document clustering
Proceedings of the 2011 ACM Symposium on Applied Computing
Editorial: Classifying text streams by keywords using classifier ensemble
Data & Knowledge Engineering
Filtering semi-structured documents based on faceted feedback
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On positive and unlabeled learning for text classification
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Dynamic categorization of clinical research eligibility criteria by hierarchical clustering
Journal of Biomedical Informatics
Collecting novel technical terms from the web by estimating domain specificity of a term
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
End-user interactions with intelligent and autonomous systems
CHI '12 Extended Abstracts on Human Factors in Computing Systems
Semi-supervised document clustering with dual supervision through seeding
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Enhancing semi-supervised document clustering with feature supervision
Proceedings of the 27th Annual ACM Symposium on Applied Computing
A unified framework for document clustering with dual supervision
ACM SIGAPP Applied Computing Review
Personalized document clustering with dual supervision
Proceedings of the 2012 ACM symposium on Document engineering
Sentiment analysis by augmenting expectation maximisation with lexical knowledge
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
On Knowledge-Enhanced Document Clustering
International Journal of Information Retrieval Research
TOM: Twitter opinion mining framework using hybrid classification scheme
Decision Support Systems
Hi-index | 0.00 |
Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate learning). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labelillg of documents. Our results show that the new method is highly effective and promising.