Clipping and Analyzing News Using Machine Learning Techniques

Authors:
Hans Gründel;Tino Naphtali;Christian Wiech;Jan-Marian Gluba;Maiken Rohdenburg;Tobias Scheffer
Affiliations:
-;-;-;-;-;-
Venue:
DS '01 Proceedings of the 4th International Conference on Discovery Science
Year:
2001

Citing 16
Cited 2

Information filtering and information retrieval: two sides of the same coin?

Communications of the ACM - Special issue on information filtering
Agents that reduce work and information overload

Communications of the ACM
Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
GroupLens: applying collaborative filtering to Usenet news

Communications of the ACM
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Expected Error Analysis for Model Selection

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Unifying Approach to HTML Wrapper Representation and Learning

DS '00 Proceedings of the Third International Conference on Discovery Science
Active Hidden Markov Models for Information Extraction

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Syskill & webert: Identifying interesting web sites

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model

IEEE Transactions on Knowledge and Data Engineering
Ex-ray: Data mining and mental health

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generating press clippings for companies manually requires a considerable amount of resources. We describe a system that monitors online newspapers and discussion boards automatically. The system extracts, classifies and analyzes messages and generates press clippings automatically, taking the specific needs of client companies into account. Key components of the system are a spider, an information extraction engine, a text classifier based on the Support Vector Machine that categorizes messages by subject, and a second classifier that analyzes which emotional state the author of a newsgroup posting was likely to be in. By analyzing large amount of messages, the system can summarize the main issues that are being reported on for given business sectors, and can summarize the emotional attitude of customers and shareholders towards companies.