Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Evaluating document clustering for interactive information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Automatic Topic Identification Using Webpage Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
PageRank without hyperlinks: structural re-ranking using links induced by language models
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A search result clustering method using informatively named entities
Proceedings of the 7th annual ACM international workshop on Web information and data management
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Topic structure mining using temporal co-occurrence
Proceedings of the 2nd international conference on Ubiquitous information management and communication
Hi-index | 0.00 |
This paper proposes a novel text mining method for a document set based on graph-based analysis. Graph-based analysis first identifies the similarity links in the document set and then determines core documents, those that have the highest level of centrality. Each core document represents a different topic. Next, the centrality scores are used together with the graph structure to identify those documents that are associated with the core documents. This process results in a predetermined number of topics. For each topic the user is presented with a set of documents in three-layer structure: core document, supplemental documents (those that are strongly associated with the core document), and subtopic documents (those that are only slightly associated with the core document and supplemental documents). The user can select any the topics and browse the documents related to that topic. Furthermore, the user can select documents according to the level; for example, subtopic documents are assumed to contain information that differs from the topic indicated and so might be interesting. In analyses of a set of newspaper articles, we evaluate “accuracy of topic identification” and “accuracy of document collecting related to the topics”. Furthermore, we show an example of document set visualization based on graph structure and centrality score; the results indicate the method's usefulness for browsing and analyzing document sets.