Text categorization for multiple users based on semantic features from a machine-readable dictionary

Authors:
Elizabeth D. Liddy;Woojin Paik;Edmund S. Yu
Affiliations:
Syracuse Univ., Syracuse, NY;Syracuse Univ., Syracuse, NY;Syracuse Univ., Syracuse, NY
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
1994

Citing 9
Cited 8

Computational lexicography for natural language processing

Computational lexicography for natural language processing
Text representation for intelligent text retrieval: a classification-oriented view

Text-based intelligent systems
Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval

Representation and learning in information retrieval
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology, and Artificial Intelligence

Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology, and Artificial Intelligence
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories

IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
An overview of DR-LINK and its approach to document filtering

HLT '93 Proceedings of the workshop on Human Language Technology

The role of intermediary services in emerging digital libraries

Proceedings of the first ACM international conference on Digital libraries
A multilevel approach to intelligent information filtering: model, system, and evaluation

ACM Transactions on Information Systems (TOIS)
Evolving intelligent text-based agents

AGENTS '00 Proceedings of the fourth international conference on Autonomous agents
Concept-based knowledge discovery in texts extracted from the Web

ACM SIGKDD Explorations Newsletter
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
DR-LINK in TIPSTER III

Information Retrieval
Design and implementation of an ontology algorithm for web documents classification

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
Exploit semantic information for category annotation recommendation in wikipedia

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The text categorization module described here provides a front-end filtering function for the larger DR-LINK text retrieval system [Liddy and Myaeing 1993]. The model evaluates a large incoming stream of documents to determine which documents are sufficiently similar to a profile at the broad subject level to warrant more refined representation and matching. To accomplish this task, each substantive word in a text is first categorized using a feature set based on the semantic Subject Field Codes (SFCs) assigned to individual word senses in a machine-readable dictionary. When tested on 50 user profiles and 550 megabytes of documents, results indicate that the feature set that is the basis of the text categorization module and the algorithm that establishes the boundary of categories of potentially relevant documents accomplish their tasks with a high level of performance.This means that the category of potentially relevant documents for most profiles would contain at least 80% of all documents later determined to be relevant to the profile. The number of documents in this set would be uniquely determined by the system's category-boundary predictor, and this set is likely to contain less than 5% of the incoming stream of documents.