Grouping Web Pages about Persons and Organizations for Information Extraction

Authors:
Shiren Ye;Tat-Seng Chua;Jimin Liu;Jeremy R. Kei
Affiliations:
-;-;-;-
Venue:
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Year:
2002

Citing 11
Cited 0

Automatic text processing

Automatic text processing
Information foraging in information access environments

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Document clustering for electronic meetings: an experimental comparison of two techniques

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Partitioning-based clustering for Web document categorization

Decision Support Systems - Special issue on WITS '97
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automatic discovery of similarity relationships through Web mining

Decision Support Systems - Web retrieval and mining
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction on the Web permits users to retrieve specific information on a person or an organization. As names are non-unique, the same name may be mapped to multiple entities. The aim of this paper is to describe an algorithm to cluster Web pages returned by search engines so that pages belonging to different entities are clustered into different groups. The algorithm uses named entities as the features to divide the document set into direct and indirect pages. It then uses distinct direct pages as seeds of clusters to group indirect pages into different clusters. The algorithm has been found to be effective for Web-based applications.