Automatic text processing
Information foraging in information access environments
CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Document clustering for electronic meetings: an experimental comparison of two techniques
Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Partitioning-based clustering for Web document categorization
Decision Support Systems - Special issue on WITS '97
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Automatic discovery of similarity relationships through Web mining
Decision Support Systems - Web retrieval and mining
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
Hi-index | 0.00 |
Information extraction on the Web permits users to retrieve specific information on a person or an organization. As names are non-unique, the same name may be mapped to multiple entities. The aim of this paper is to describe an algorithm to cluster Web pages returned by search engines so that pages belonging to different entities are clustered into different groups. The algorithm uses named entities as the features to divide the document set into direct and indirect pages. It then uses distinct direct pages as seeds of clusters to group indirect pages into different clusters. The algorithm has been found to be effective for Web-based applications.