Text mining using markov chains of variable length

Authors:
Björn Hoffmeister;Thomas Zeugmann
Affiliations:
RWTH Aachen, Lehrstuhl für Informatik VI, Aachen;Division of Computer Science, Hokkaido University, Sapporo, Japan
Venue:
Proceedings of the 2005 international conference on Federation over the Web
Year:
2005

Citing 13
Cited 1

Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Natural language processing for information retrieval

Communications of the ACM
The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Foundations of statistical natural language processing

Foundations of statistical natural language processing
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Principles of data mining

Principles of data mining
Machine Learning

Machine Learning
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Introduction to Formal Language Theory

Introduction to Formal Language Theory
Understanding Probabilistic Classifiers

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Discriminative Feature Selection via Multiclass Variable Memory Markov Model

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language

N-gram analysis based on zero-suppressed BDDs

JSAI'06 Proceedings of the 20th annual conference on New frontiers in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

When dealing with knowledge federation over text documents one has to figure out whether or not documents are related by context. A new approach is proposed to solve this problem. This leads to the design of a new search engine for literature research and related problems. The idea is that one has already some documents of interest. These documents are taken as input. Then all documents known to a classical search engine are ranked according to their relevance. For achieving this goal we use Markov chains of variable length. The algorithms developed have been implemented and testing over the Reuters-21578 data set has been performed.