Discovering unexpected documents in corpora

Authors:
François Jacquenet;Christine Largeron
Affiliations:
University of Lyon, University of Saint-Etienne, Laboratoire Hubert Curien, UMR CNRS 5516, 18 rue Benoit Lauras, F42000 Saint-Etienne, France;University of Lyon, University of Saint-Etienne, Laboratoire Hubert Curien, UMR CNRS 5516, 18 rue Benoit Lauras, F42000 Saint-Etienne, France
Venue:
Knowledge-Based Systems
Year:
2009

Citing 27
Cited 6

Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Unexpectedness as a measure of interestingness in knowledge discovery

Decision Support Systems - Special issue on WITS '97
Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Textual Data Mining to Support Science and Technology Management

Journal of Intelligent Information Systems
Discovering unexpected information from your competitors' web sites

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
What Makes Patterns Interesting in Knowledge Discovery Systems

IEEE Transactions on Knowledge and Data Engineering
Finding Interesting Patterns Using User Expectations

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Mining Surprising Patterns Using Temporal Description Length

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Emerging Topic Tracking System

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Discovery of Emerging Topics between Communities on WWW

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications

Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor

ADL '98 Proceedings of the Advances in Digital Libraries Conference
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Discovering unexpected information for technology watch

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Unified algorithm for undirected discovery of exception rules: Research Articles

International Journal of Intelligent Systems - Knowledge Discovery: Dedicated to Jan M. Żytkow
On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery

IEEE Transactions on Knowledge and Data Engineering
Improving novelty detection for general topics using sentence level information patterns

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Knowledge Discovery and Data Mining: Challenges and Realities

Knowledge Discovery and Data Mining: Challenges and Realities
Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling

IEEE Transactions on Knowledge and Data Engineering
Condensed representation of EPs and patterns quantified by frequency-based measures

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Clustering XML documents using structural summaries

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Data clustering with size constraints

Knowledge-Based Systems
Finding key attribute subset in dataset for outlier detection

Knowledge-Based Systems
Development and application of tender evaluation decision-making and risk early warning system for water projects based on KDD

Advances in Engineering Software
A modification of the k-means method for quasi-unsupervised learning

Knowledge-Based Systems
Sample cutting method for imbalanced text sentiment classification based on BRC

Knowledge-Based Systems
NLP-based faceted search: Experience in the development of a science and technology search engine

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text mining is widely used to discover frequent patterns in large corpora of documents. Hence, many classical data mining techniques, that have been proven fruitful in the context of data stored in relational databases, are now successfully used in the context of textual data. Nevertheless, there are many situations where it is more valuable to discover unexpected information rather than frequent ones. In the context of technology watch for example, we may want to discover new trends in specific markets, or discover what competitors are planning in the near future, etc. This paper is related to that context of research. We have proposed several unexpectedness measures and implemented them in a prototype, called UnexpectedMiner, that can be used by watchers, in order to discover unexpected documents in large corpora of documents (patents, datasheets, advertisements, scientific papers, etc.). UnexpectedMiner is able to take into account the structure of documents during the discovery of unexpected information. Many experiments have been performed in order to validate our measures and show the interest of our system.