Learning Structure from Sequences, with Applications in a Digital Library

Authors:
Ian H. Witten
Affiliations:
-
Venue:
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Year:
2002

Citing 10
Cited 1

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Data Compression Using Long Common Strings

DCC '99 Proceedings of the Conference on Data Compression
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
Using Compression to Identify Acronyms in Text

DCC '00 Proceedings of the Conference on Data Compression
How to Build a Digital Library

How to Build a Digital Library
Phrase Hierarchy Inference and Compression in Bounded Space

DCC '98 Proceedings of the Conference on Data Compression
A compression-based algorithm for Chinese word segmentation

Computational Linguistics
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2

Classification automaton and its construction using learning

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The services that digital libraries provide to users can be greatly enhanced by automatically gleaning certain kinds of information from the full text of the documents they contain. This paper reviews some recent work that applies novel techniques of machine learning (broadly interpreted) to extract information from plain text, and puts it in the context of digital library applications. We describe three areas: hierarchical phrase browsing, including efficient methods for inferring a phrase hierarchy from a large corpus of text; text mining using adaptive compression techniques, giving a new approach to generic entity extraction, word segmentation, and acronym extraction; and keyphrase extraction.