Word association norms, mutual information, and lexicography
Computational Linguistics
Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Retrieval performance in Ferret a conceptual information retrieval system
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Automatic structuring and retrieval of large text files
Communications of the ACM
Information extraction as a basis for high-precision text classification
ACM Transactions on Information Systems (TOIS)
Natural language processing for information retrieval
Communications of the ACM
Exploiting Background Information in Knowledge Discovery from Text
Journal of Intelligent Information Systems
Multilevel hypergraph partitioning: application in VLSI domain
DAC '97 Proceedings of the 34th annual Design Automation Conference
Generating association rules from semi-structured documents using an extended concept hierarchy
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Query flocks: a generalization of association-rule mining
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Efficient mining of association rules in text databases
Proceedings of the eighth international conference on Information and knowledge management
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
Machine learning of event segmentation for news on demand
Communications of the ACM
An investigation of linguistic features and clustering algorithms for topical document clustering
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Mining in the Phrasal Frontier
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Mixed-initiative development of language processing systems
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automated web issue analysis: a nurse prescribing case study
Information Processing and Management: an International Journal - Special issue: Informetrics
Web Document Clustering by Using Automatic Keyphrase Extraction
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices
Journal of Parallel and Distributed Computing
On the development of a technology intelligence tool for identifying technology opportunity
Expert Systems with Applications: An International Journal
A systematic approach to new mobile service creation
Expert Systems with Applications: An International Journal
Structuring technological information for technology roadmapping: data mining approach
AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
Towards the Automatic Construction of Conceptual Taxonomies
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Mining and ranking streams of news stories using cross-stream sequential patterns
Proceedings of the 18th ACM conference on Information and knowledge management
Data mining in deductive databases using query flocks
Expert Systems with Applications: An International Journal
Hierarchical document clustering using local patterns
Data Mining and Knowledge Discovery
An approach to indexing and clustering news stories using continuous language models
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Mining news streams using cross-stream sequential patterns
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
A topic identification task for modern standard Arabic
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Generating headline summary from a document set
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
CorePhrase: keyphrase extraction for document clustering
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Indices of novelty for emerging topic detection
Information Processing and Management: an International Journal
Fine-grained topic detection in news search results
Proceedings of the 27th Annual ACM Symposium on Applied Computing
On macro- and micro-level information in multiple documents and its influence on summarization
International Journal of Information Management: The Journal for Information Professionals
Mining interests for user profiling in electronic conversations
Expert Systems with Applications: An International Journal
Semi-Automatic Ontology Construction by Exploiting Functional Dependencies and Association Rules
International Journal on Semantic Web & Information Systems
Discovering generalized association rules from Twitter
Intelligent Data Analysis
Hi-index | 0.01 |
TopCat (Topic Categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. This paper presents a novel method for identifying related items based on traditional data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against a manually categorized ground truth news corpus; it shows this technique is effective in identifying topics in collections of news articles.