Query-independent evidence in home page finding
ACM Transactions on Information Systems (TOIS)
Chinese Named Entity Recognition combining a statistical model with human knowledge
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Hi-index | 0.00 |
The fact that a company always owns various names, such as Chinese full names, Chinese abbreviative names and English abbreviative names, makes it very difficult to collect and extract relative information about the company, because: (1) It is hard to identify a company’s Chinese abbreviative names. (2) It is hard to discover relationships between the names. This paper is to present a solution by building a large-scaled company name knowledge base, automatically, based on web pages. Firstly, name candidates will be picked out from the company’s homepage. Then relationships between them will be discovered, and candidate will be ranked accordingly. Thirdly, name knowledge base will be built according to above results. This knowledge base can be applied to identify abbreviative company names and to collect relative information about the company. Experiments’ results indicate that this method is effective and can be applied to company name normalization and key word expansion, and it has worked in a practical company information extraction system.