Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
ACM Transactions on Database Systems (TODS)
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Topic Extraction from News Archive Using TF*PDF Algorithm
WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
The Journal of Machine Learning Research
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
BuzzTrack: topic detection and tracking in email
Proceedings of the 12th international conference on Intelligent user interfaces
Tag clouds for summarizing web search results
Proceedings of the 16th international conference on World Wide Web
Introduction to Information Retrieval
Introduction to Information Retrieval
Text, Image and Vector Graphics Based Appraisal of Contemporary Documents
ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
Content-Based Clustering for Tag Cloud Visualization
ASONAM '09 Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining
An extensive empirical study of collocation extraction methods
ACLstudent '05 Proceedings of the ACL Student Research Workshop
Detecting topic evolution in scientific literature: how can citations help?
Proceedings of the 18th ACM conference on Information and knowledge management
Tag Clusters as Information Retrieval Interfaces
HICSS '10 Proceedings of the 2010 43rd Hawaii International Conference on System Sciences
TIARA: a visual exploratory text analytic system
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Faceted Search
PatentMiner: topic-driven patent analysis and mining
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Navigating information facets on twitter (NIF-T)
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.