SCISOR: extracting information from on-line news
Communications of the ACM
Lexico-semantic pattern matching as a companion to parsing in text understanding
HLT '91 Proceedings of the workshop on Speech and Natural Language
Creating segmented databases from free text for text retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
GE: description of the NLTooLSET system as used for MUC-3
MUC3 '91 Proceedings of the 3rd conference on Message understanding
Mining Text Using Keyword Distributions
Journal of Intelligent Information Systems
On feature distributional clustering for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Modeling content identification from document images
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A syntactically-based query reformulation technique for information retrieval
Information Processing and Management: an International Journal
Parsing run amok: relation-driven control for text analysis
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Emotion Sensitive News Agent (ESNA): A system for user centric emotion sensing from the news
Web Intelligence and Agent Systems
Hi-index | 0.00 |
Automatic news categorization systems have produced high accuracy, consistency, and flexibility using some natural language processing techniques. These knowledge-based categorization methods are more powerful and accurate than statistical techniques. However, the phrasal pre-processing and pattern matching methods that seem to work for categorization have the disadvantage of requiring a fair amount of knowledge-encoding by human beings. In addition, they work much better at certain tasks, such as identifying major events in texts, than at others, such as determining what sort of business or product is involved in a news event.Statistical methods for categorization, on the other hand, are easy to implement and require little or no human customization. But they don't offer any of the benefits of natural language processing, such as the ability to identify relationships and enforce linguistic constraints.Our approach has been to use statistics in the knowledge acquisition component of a linguistic pattern-based categorization system, using statistical methods, for example, to associate words with industries and identify phrases that information about businesses or products. Instead of replacing knowledge-based methods with statistics, statistical training replaces knowledge engineering. This has resulted in high accuracy, shorter customization time, and good prospects for the application of the statistical methods to problems in lexical acquisition.