An efficient context-free parsing algorithm
Communications of the ACM
Named entity recognition: a maximum entropy approach using global information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Named entity recognition using an HMM-based chunk tagger
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Japanese Named Entity extraction with redundant morphological analysis
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Efficient deep processing of Japanese
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
An investigation of various information sources for classifying biological names
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Learning the meaning and usage of time phrases from a parallel text-data corpus
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
The semantic knowledge-base of contemporary Chinese and its applications in WSD
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Syntactic and semantic disambiguation of numeral strings using an n-gram method
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Hi-index | 0.00 |
This paper describes a performance comparison for two approaches to numeral string interpretation: manually generated rule-based interpretation of numerals and strings including numerals [8] vs automatically generated feature-based interpretation. The system employs three interpretation processes: word trigram construction with a tokeniser, rule-based processing of number strings, and n-gram based classification. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. For feature-based interpretation, we tested on 11 datasets, with random selection of sample data to extract tabular feature-based constraints. The rule-based approach resulted in 86.8% precision and 77.1% recall ratio. The feature-based interpretation resulted in 83.1% precision and 74.5% recall ratio.