Boosting text segmentation via progressive classification

Authors:
Eugenio Cesario;Francesco Folino;Antonio Locane;Giuseppe Manco;Riccardo Ortale
Affiliations:
ICAR-CNR, Via Bucci 41c, 87036, Rende(CS), Italy;ICAR-CNR, Via Bucci 41c, 87036, Rende(CS), Italy;ICAR-CNR, Via Bucci 41c, 87036, Rende(CS), Italy;ICAR-CNR, Via Bucci 41c, 87036, Rende(CS), Italy;ICAR-CNR, Via Bucci 41c, 87036, Rende(CS), Italy
Venue:
Knowledge and Information Systems
Year:
2008

Citing 15
Cited 5

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
A Machine Learning Approach to POS Tagging

Machine Learning
Automatic segmentation of text into structured records

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Toward general-purpose learning for information extraction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Mining reference tables for automatic text segmentation

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Web wrapper induction: a brief survey

AI Communications

Rule Learning with Probabilistic Smoothing

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
An incremental clustering scheme for data de-duplication

Data Mining and Knowledge Discovery
Classification based on specific rules and inexact coverage

Expert Systems with Applications: An International Journal
Assessing mathematics learning achievement using hybrid rough set classifiers and multiple regression analysis

Applied Soft Computing
CAR-NF: A classifier based on specific rules with high netconf

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea is to subject the available text to progressive classification, i.e., a multi-stage classification scheme where, at each intermediate stage, a classifier is learnt that analyzes the textual fragments not reconciled at the end of the previous steps. Classification is accomplished by an ad hoc exploitation of traditional association mining algorithms, and is supported by a data transformation scheme which takes advantage of domain-specific dictionaries/ontologies. A key feature is the capability of progressively enriching the available ontology with the results of the previous stages of classification, thus significantly improving the overall classification accuracy. An extensive experimental evaluation shows the effectiveness of our approach.