Automatic text processing
Information foraging in information access environments
CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning decision tree classifiers
ACM Computing Surveys (CSUR)
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
ACM Computing Surveys (CSUR)
Partitioning-based clustering for Web document categorization
Decision Support Systems - Special issue on WITS '97
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
ACM SIGKDD Explorations Newsletter
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic question answering on the web
Proceedings of the 11th international conference on World Wide Web
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
Web search engine working as a bee hive
Web Intelligence and Agent Systems
Estimating the size and evolution of categorised topics in web directories
Web Intelligence and Agent Systems
Hi-index | 0.00 |
One of the most frequent Web surfing tasks is to search for persons and organizations by their names. Such names are often not distinctive, commonly occurring, and non-unique. Thus, a single name may be mapped to several named target entities. This paper describes a new methodology to cluster web pages returned by a search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, and link-based and structure-based information as features to partition the document set into direct and indirect pages by means of a decision-tree model. It then chooses the appropriate distinctive direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for web-based information retrieval applications.