A co-training based method for chinese patent semantic annotation

Authors:
Xu Chen;Zhiyong Peng;Cheng Zeng
Affiliations:
Wuhan University, Wuhan, Hubei, China;Wuhan University, Wuhan, Hubei, China;Wuhan University, Wuhan, China
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 7
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Overview of patent retrieval task at NTCIR-3

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Automatic query generation for patent search

Proceedings of the 18th ACM conference on Information and knowledge management
Patent claim decomposition for improved information extraction

Proceedings of the 2nd international workshop on Patent information retrieval
A Rules and Statistical Learning Based Method for Chinese Patent Information Extraction

WISA '11 Proceedings of the 2011 Eighth Web Information Systems and Applications Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Patents are public and scientific literatures protected by the law, and their abstracts highly contained valuable information. Patent's semantic annotation can effectively protect intellectual property rights and promote corporations' scientific research innovation. Currently, automatic patent annotation mainly used supervised machine learning algorithms, which required abundant expensive labeled patent data. Due to lack of enough labeled Chinese patent data, this paper adopted a semi-supervised machine learning method named co-training, which started from a little labeled data. This method combined keyword extraction with list extraction, and incrementally annotated functional clauses in patent abstract. Experiment results indicated this method can gradually improve the recall without sacrificing the precision.