An efficient context-free parsing algorithm
Communications of the ACM
Proceedings of the 11th international conference on World Wide Web
Adaptive Web Document Classification with MCRDR
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Named entity recognition: a maximum entropy approach using global information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Named entity recognition using an HMM-based chunk tagger
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Japanese Named Entity extraction with redundant morphological analysis
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A hybrid approach to protein name identification in biomedical texts
Information Processing and Management: an International Journal
Efficient deep processing of Japanese
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
An investigation of various information sources for classifying biological names
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Learning the meaning and usage of time phrases from a parallel text-data corpus
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
The semantic knowledge-base of contemporary Chinese and its applications in WSD
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
s-grams: Defining generalized n-grams for information retrieval
Information Processing and Management: an International Journal
Contextual feature selection for text classification
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Syntactic and semantic disambiguation of numeral strings using an n-gram method
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Comparison of numeral strings interpretation: rule-based and feature-based n-gram methods
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Hi-index | 0.00 |
This paper describes and compares the use of methods based on N-grams (specifically trigrams and pentagrams), together with five features, to recognise the syntactic and semantic categories of numeral strings representing money, number, date, etc., in texts. The system employs three interpretation processes: word N-grams construction with a tokeniser; rule-based processing of numeral strings; and N-gram-based classification. We extracted numeral strings from 1, 111 online newspaper articles. For numeral strings interpretation, we chose 112 (10%) of 1, 111 articles to provide unseen test data (1, 278 numeral strings), and used the remaining 999 articles to provide 11, 525 numeral strings for use in extracting N-gram-based constraints to disambiguate meanings of the numeral strings. The word trigrams method resulted in 83.8% precision, 81.2% recall ratio, and 82.5% in F-measurement ratio. The word pentagrams method resulted in 86.6% precision, 82.9% recall ratio, and 84.7% in F-measurement ratio.