A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Circle Graphs: New Visualization Tools for Text-Mining
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Extracting Predictors of Corporate Bankruptcy: Empirical Study on Data Mining Methods
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor
ADL '98 Proceedings of the Advances in Digital Libraries Conference
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data
Journal of Biomedical Informatics
Text analysis and knowledge mining system
IBM Systems Journal
A measure of term representativeness based on the number of co-occurring salient words
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Introduction to Information Retrieval
Introduction to Information Retrieval
Search and analysis of bankruptcy cause by classification network
MEDI'11 Proceedings of the First international conference on Model and data engineering
Hi-index | 0.00 |
In this paper, we consider long documents and try to find differences between document collections. In the analysis of document collections such as project status reports or annual reports, each document and each sentence tend to be relatively long. Therefore, it can be difficult to derive insights by looking only for representative concepts in the selected document collection based on a divergence metric. In this paper, we propose an analysis approach based on contextual information. By extracting pairs of a topic word and a keyword and assessing their representativeness in the selected document collection, we are developing a method to extract insights from these long documents. Applying the proposed method for the analysis between the annual reports of bankrupt companies and those of sound companies, we were able to derive insights that could not be extracted with the conventional methods.