Computational lexicography for natural language processing
Computational lexicography for natural language processing
Text representation for intelligent text retrieval: a classification-oriented view
Text-based intelligent systems
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval
Representation and learning in information retrieval
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology, and Artificial Intelligence
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
An overview of DR-LINK and its approach to document filtering
HLT '93 Proceedings of the workshop on Human Language Technology
The role of intermediary services in emerging digital libraries
Proceedings of the first ACM international conference on Digital libraries
A multilevel approach to intelligent information filtering: model, system, and evaluation
ACM Transactions on Information Systems (TOIS)
Evolving intelligent text-based agents
AGENTS '00 Proceedings of the fourth international conference on Autonomous agents
Concept-based knowledge discovery in texts extracted from the Web
ACM SIGKDD Explorations Newsletter
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
Design and implementation of an ontology algorithm for web documents classification
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
Exploit semantic information for category annotation recommendation in wikipedia
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
The text categorization module described here provides a front-end filtering function for the larger DR-LINK text retrieval system [Liddy and Myaeing 1993]. The model evaluates a large incoming stream of documents to determine which documents are sufficiently similar to a profile at the broad subject level to warrant more refined representation and matching. To accomplish this task, each substantive word in a text is first categorized using a feature set based on the semantic Subject Field Codes (SFCs) assigned to individual word senses in a machine-readable dictionary. When tested on 50 user profiles and 550 megabytes of documents, results indicate that the feature set that is the basis of the text categorization module and the algorithm that establishes the boundary of categories of potentially relevant documents accomplish their tasks with a high level of performance.This means that the category of potentially relevant documents for most profiles would contain at least 80% of all documents later determined to be relevant to the profile. The number of documents in this set would be uniquely determined by the system's category-boundary predictor, and this set is likely to contain less than 5% of the incoming stream of documents.