An efficient minimum vocabulary construction algorithm for language modeling

  • Authors:
  • Sina Lin;Zengchang Qin;Zehua Huang;Tao Wan

  • Affiliations:
  • Intelligent Computing and Machine Learning Lab, School of ASEE, Beihang University, Beijing, China;Intelligent Computing and Machine Learning Lab, School of ASEE, Beihang University, Beijing, China,Robotics Institute, Carnegie Mellon University, Pittsburgh;Intelligent Computing and Machine Learning Lab, School of ASEE, Beihang University, Beijing, China,School of Advanced Engineering, Beihang University, China;School of Medicine, Boston University, Boston

  • Venue:
  • IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In learning a new word by a dictionary, we first need to know a set of "basic words" which are frequently appeared in word definitions. It often happens that you cannot understand the word you looked up because there are still some words you do not understand in its definitions or explanations provided by the dictionary. You can keep looking up these new words recursively till they all can be well explained by some basic words you already knew. How to automatically find a minimum set of such basic words to define (or recursively define) the entire vocabulary in a given dictionary is what are going to discuss in this paper. We propose an efficient algorithm to construct the Minimum Vocabulary (MV) using the word frequency information. The minimum vocabulary can be used for language modeling and experimental results demonstrate the effectiveness of using the minimum vocabulary as features in text classification.