Improving discriminative sequential learning with rare--but--important associations

  • Authors:
  • Xuan-Hieu Phan;Le-Minh Nguyen;Tu-Bao Ho;Susumu Horiguchi

  • Affiliations:
  • Japan Advanced Inst. of Science & Technology, Nomi, Ishikawa, Japan;Japan Advanced Inst. of Science & Technology, Nomi, Ishikawa, Japan;Japan Advanced Inst. of Science & Technology, Nomi, Ishikawa, Japan;Tohoku University, Sendai, Japan

  • Venue:
  • Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discriminative sequential learning models like Conditional Random Fields (CRFs) have achieved significant success in several areas such as natural language processing or information extraction. Their key advantage is the ability to capture various non--independent and overlapping features of inputs. However, several unexpected pitfalls have a negative influence on the model's performance; these mainly come from an imbalance among classes/labels, irregular phenomena, and potential ambiguity in the training data. This paper presents a data--driven approach that can deal with such hard--to--predict data instances by discovering and emphasizing rare--but--important associations of statistics hidden in the training data. Mined associations are then incorporated into these models to deal with difficult examples. Experimental results of English phrase chunking and named entity recognition using CRFs show a significant improvement in accuracy. In addition to the technical perspective, our approach also highlights a potential connection between association mining and statistical learning by offering an alternative strategy to enhance learning performance with interesting and useful patterns discovered from large dataset.