Multi-documents Automatic Abstracting based on text clustering and semantic analysis

Authors:
Qinglin Guo;Ming Zhang
Affiliations:
Department of Computer Science and Technology, Peking University, Beijing 100871, China and School of Computer Science and Technology, North China Electric Power University, Beijing 102206, China;Department of Computer Science and Technology, Peking University, Beijing 100871, China
Venue:
Knowledge-Based Systems
Year:
2009

Citing 7
Cited 3

Class-based n-gram models of natural language

Computational Linguistics
Natural language analysis for semantic document modeling

Data & Knowledge Engineering
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets

Data Mining and Knowledge Discovery
Learning with Hybrid Data

HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
Mining Multiple Data Sources: Local Pattern Analysis

Data Mining and Knowledge Discovery
Fuzzy clustering with partial supervision

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Generalized fuzzy c-means clustering strategies using Lp norm distances

IEEE Transactions on Fuzzy Systems

ROLEX-SP: Rules of lexical syntactic patterns for free text categorization

Knowledge-Based Systems
A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

Knowledge-Based Systems
Probability-based text clustering algorithm by alternately repeating two operations

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A method of realization of multi-documents Automatic Abstracting based on text clustering and semantic analysis is brought forward, aimed at overcoming shortages of some current methods about multi-documents. The method makes use of semantic analysis and can realize Automatic Abstracting of multi-documents. The algorithm of twice word segmentation based on the title and first-sentences in paragraphs is brought forward. Its precision and recall is above 95%. For a specific domain on plastics, an Automatic Abstracting system named TCAAS is implemented. The precision and recall of multi-document's Automatic Abstracting is above 75%. And experiments do prove that it is feasible to use the method to develop a domain Automatic Abstracting system, which is valuable for further study in more depth.