Information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Agents that reduce work and information overload
Communications of the ACM
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Learning and Revising User Profiles: The Identification ofInteresting Web Sites
Machine Learning - Special issue on multistrategy learning
SONIA: a service for organizing networked information autonomously
Proceedings of the third ACM conference on Digital libraries
MailCat: an intelligent assistant for organizing e-mail
Proceedings of the third annual conference on Autonomous Agents
Machine Learning
Athena: Mining-Based Interactive Management of Text Database
EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Athena: Mining-Based Interactive Management of Text Database
EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cross-lingual text categorization: Conquering language boundaries in globalized environments
Information Processing and Management: an International Journal
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Exploiting attribute redundancy for web entity data extraction
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Learning to separate text content and style for classification
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Refinement method of post-processing and training for improvement of automated text classification
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II
On text mining algorithms for automated maintenance of hierarchical knowledge directory
KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Confidence-Based incremental classification for objects with limited attributes in vertical search
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Genetic optimized artificial immune system in spam detection: a review and a model
Artificial Intelligence Review
Exploiting poly-lingual documents for improving text categorization effectiveness
Decision Support Systems
Hi-index | 0.00 |
We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive mining-based operations. Requirements of any such system include speed and minimal end-user effort. Athena satisfies these requirements through linear-time classification and clustering engines which are applied interactively to speed the development of accurate models. Naive Bayes classifiers are recognized to be among the best for classifying text. We show that our specialization of the Naive Bayes classifier is considerably more accurate (7 to 29% absolute increase in accuracy) than a standard implementation. Our enhancements include using Lid-stone's law of succession instead of Laplace's law, under-weighting long documents, and over-weighting author and subject. We also present a new interactive clustering algorithm, C-Evolve, for topic discovery. C-Evolve first finds highly accurate cluster digests (partial clusters), gets user feedback to merge and correct these digests, and then uses the classification algorithm to complete the partitioning of the data. By allowing this interactivity in the clustering process, C-Evolve achieves considerably higher clustering accuracy (10 to 20% absolute increase in our experiments) than the popular K-Means and agglomerative clustering methods.