Graph based semi-supervised approach for information extraction

Authors:
Hany Hassan;Ahmed Hassan;Sara Noeman
Affiliations:
IBM Cairo Technology Development Center, Giza, Egypt;IBM Cairo Technology Development Center, Giza, Egypt;IBM Cairo Technology Development Center, Giza, Egypt
Venue:
TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Year:
2006

Citing 10
Cited 2

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A cluster algorithm for graphs

A cluster algorithm for graphs
Algorithms for estimating relative importance in networks

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Semi-supervised learning using randomized mincuts

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004

Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
MuLLinG: multilevel linguistic graphs for knowledge extraction

TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification techniques deploy supervised labeled instances to train classifiers for various classification problems. However labeled instances are limited, expensive, and time consuming to obtain, due to the need of experienced human annotators. Meanwhile large amount of unlabeled data is usually easy to obtain. Semi-supervised learning addresses the problem of utilizing unlabeled data along with supervised labeled data, to build better classifiers. In this paper we introduce a semi-supervised approach based on mutual reinforcement in graphs to obtain more labeled data to enhance the classifier accuracy. The approach has been used to supplement a maximum entropy model for semi-supervised training of the ACE Relation Detection and Characterization (RDC) task. ACE RDC is considered a hard task in information extraction due to lack of large amounts of training data and inconsistencies in the available data. The proposed approach provides 10% relative improvement over the state of the art supervised baseline system.