Topic structure mining using pagerank without hyperlinks

Authors:
Hiroyuki Toda;Ko Fujimura;Ryoji Kataoka;Hiroyuki Kitagawa
Affiliations:
NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan;NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan;NTT Cyber Solutions Laboratories, NTT Corporation, Kanagawa, Japan;Graduate School of Systems and Information Engineering
Venue:
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Year:
2006

Citing 12
Cited 1

Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Automatic Topic Identification Using Webpage Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: structural re-ranking using links induced by language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A search result clustering method using informatively named entities

Proceedings of the 7th annual ACM international workshop on Web information and data management
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Spectral learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Topic structure mining using temporal co-occurrence

Proceedings of the 2nd international conference on Ubiquitous information management and communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel text mining method for any given document set. It is based on PageRank-based centrality scores within the graph structure generated from the similarity of all document pairs. Evaluations using a newspaper collection show that the proposed approach yields much better performance in terms of main topic identification and topical clustering than the baseline method. Furthermore, we show an example of document set visualization that offers novel document browsing through the topic structure. Experiments show that our topic structure mining method is useful for user-oriented document selection.