The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval
Journal of the American Society for Information Science
Word sense disambiguation using machine-readable dictionaries
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
The use of phrases and structured queries in information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Creating segmented databases from free text for text retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval performance in Ferret a conceptual information retrieval system
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
An evaluation of text analysis technologies
AI Magazine
Information filtering and information retrieval: two sides of the same coin?
Communications of the ACM - Special issue on information filtering
Using cases to represent context for text classification
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Automatic Indexing: An Experimental Inquiry
Journal of the ACM (JACM)
Automatic Document Classification
Journal of the ACM (JACM)
Modeling Legal Arguments: Reasoning with Cases and Hypotheticals
Modeling Legal Arguments: Reasoning with Cases and Hypotheticals
Prism: A Case-Based Telex Classifier
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models
Computational Linguistics - Special issue on using large corpora: II
Computational aspects of discourse in the context of MUC-3
MUC3 '91 Proceedings of the 3rd conference on Message understanding
UMass/Hughes: description of the CIRCUS system used for MUC-5
MUC5 '93 Proceedings of the 5th conference on Message understanding
University of Massachusetts: description of the CIRCUS system as used for MUC-4
MUC4 '92 Proceedings of the 4th conference on Message understanding
Little words can make a big difference for text classification
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Document classification using multiword features
Proceedings of the seventh international conference on Information and knowledge management
A Value-Driven System for Autonomous Information Gathering
Journal of Intelligent Information Systems
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A framework for specifying explicit bias for revision of approximate information extraction rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
A natural language interface for information retrieval from forms on the World Wide Web
ICIS '99 Proceedings of the 20th international conference on Information Systems
Information extraction for Thai documents
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Concept-based knowledge discovery in texts extracted from the Web
ACM SIGKDD Explorations Newsletter
Querying Documents using Content, Structure and Properties
Journal of Intelligent Information Systems
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Effective Text Retrieval Based on Combining Evidence from the Corpus and Users
IEEE Expert: Intelligent Systems and Their Applications
Text Categorization: An Experiment Using Phrases
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Classify Web Document by Key Phrase Understanding
WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
Lazy Learning Algorithms for Problems with Many Binary Features and Classes
IBERAMIA '98 Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence
Where to Position the Precision in Knowledge Extraction from Text
Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Information Extraction from HTML: Combining XML and Standard Techniques for IE from the Web
Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Recognizing Ontology-Applicable Multiple-Record Web Documents
ER '01 Proceedings of the 20th International Conference on Conceptual Modeling: Conceptual Modeling
Incremental context mining for adaptive document classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Event detection from online news documents for supporting environmental scanning
Decision Support Systems - Special issue: Knowledge management technique
Information Extraction from the Web: System and Techniques
Applied Intelligence
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
A new structure for news editing
IBM Systems Journal
Toward semantic understanding: an approach based on information extraction ontologies
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
A knowledge-based approach to text classification
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Approaches to text mining for clinical medical records
Proceedings of the 2006 ACM symposium on Applied computing
Generalizing from relevance feedback using named entity wildcards
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A machine learning approach to web page filtering using content and structure analysis
Decision Support Systems
A web-based multi-agent system approach to document engineering
International Journal of Web Engineering and Technology
Extracting clinical trial design information from MEDLINE abstracts
New Generation Computing
Information Processing and Management: an International Journal
PubMed smarter: Query expansion with implicit words based on gene ontology
Knowledge-Based Systems
Context-Based Term Frequency Assessment for Text Classification
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Using phrases as features in email classification
Journal of Systems and Software
Journal of Biomedical Informatics
Proceedings of the 2007 conference on Human interface: Part II
Part-whole reasoning in an object-centered framework
Part-whole reasoning in an object-centered framework
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Journal of Biomedical Informatics
The role of information extraction in the design of a document triage application for biocuration
BioNLP '11 Proceedings of BioNLP 2011 Workshop
High-precision phrase-based document classification on a modern scale
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge and reasoning for question answering: Research perspectives
Information Processing and Management: an International Journal
Application of text categorization to astronomy field
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Mining and supporting task-stage knowledge: a hierarchical clustering technique
PAKM'06 Proceedings of the 6th international conference on Practical Aspects of Knowledge Management
Building systems to block pornography
IM'99 Proceedings of the 1999 international conference on Challenge of Image Retrieval
The automatic generation of templates for automatic abstracting
IRSG'99 Proceedings of the 21st Annual BCS-IRSG conference on Information Retrieval Research
Concept comparison engines: A new frontier of search
Decision Support Systems
Audience targeting by B-to-B advertisement classification: A neural network approach
Expert Systems with Applications: An International Journal
What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings
Decision Support Systems
Hi-index | 0.00 |
We describe an approach to text classification that represents a compromise between traditional word-based techniques and in-depth natural language processing. Our approach uses a natural language processing task called “information extraction” as a basis for high-precision text classification. We present three algorithms that use varying amounts of extracted information to classify texts. The relevancy signatures algorithm uses linguistic phrases; the augmented relevancy signatures algorithm uses phrases and local context; and the case-based text classification algorithm uses larger pieces of context. Relevant phrases and contexts are acquired automatically using a training corpus. We evaluate the algorithms on the basis of two test sets from the MUC-4 corpus. All three algorithms achieved high precision on both test sets, with the augmented relevancy signatures algorithm and the case-based algorithm reaching 100% precision with over 60% recall on one set. Additionally, we compare the algorithms on a larger collection of 1700 texts and describe an automated method for empirically deriving appropriate threshold values. The results suggest that information extraction techniques can support high-precision text classification and, in general, that using more extracted information improves performance. As a practical matter, we also explain how the text classification system can be easily ported across domains.