Syntactic and semantic disambiguation of numeral strings using an n-gram method

Authors:
Kyongho Min;William H. Wilson;Yoo-Jin Moon
Affiliations:
School of Computer and Information Sciences, AUT, Auckland, New Zealand;School of Computer Science and Engineering, UNSW, Sydney, Australia;Department of Management Information Systems, HUFS, YongIn, Kyonggi, Korea
Venue:
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Year:
2005

Citing 7
Cited 2

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
An efficient context-free parsing algorithm

Communications of the ACM
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Japanese Named Entity extraction with redundant morphological analysis

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Efficient deep processing of Japanese

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
An investigation of various information sources for classifying biological names

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
The semantic knowledge-base of contemporary Chinese and its applications in WSD

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17

Effectiveness of methods for syntactic and semantic recognition of numeral strings: tradeoffs between number of features and length of word N-grams

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Comparison of numeral strings interpretation: rule-based and feature-based n-gram methods

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the interpretation of numerals, and strings including numerals, composed of a number and words or symbols that indicate whether the string is a SPEED, LENGTH, or whatever. The interpretation is done at three levels: lexical, syntactic, and semantic. The system employs three interpretation processes: a word trigram constructor with tokeniser, a rule-based processor of number strings, and n-gram based disambiguation of meanings. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. We chose 287 of these articles to provide unseen test data (3251 numeral strings), and used the remaining 91 articles to provide 886 numeral strings for use in manually extracting n-gram constraints to disambiguate the meanings of the numeral strings. We implemented six different disambiguation methods based on category frequency statistics collected from the sample data and on the number of word trigram constraints of each category. Precision ratios for the six methods when applied to the test data ranged from 85.6% to 87.9%.