Automatic keyphrase extraction from scientific documents using N-gram filtration technique

Authors:
Niraj Kumar;Kannan Srinathan
Affiliations:
IIIT-Hyderabad, Hyderabad, India;IIIT-Hyderabad, Hyderabad, India
Venue:
Proceedings of the eighth ACM symposium on Document engineering
Year:
2008

Citing 9
Cited 12

KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Introduction to data compression (2nd ed.)

Introduction to data compression (2nd ed.)
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Surface grammatical analysis for the extraction of terminological noun phrases

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
A practical system of keyphrase extraction for web pages

Proceedings of the 14th ACM international conference on Information and knowledge management
Thesaurus based automatic keyphrase indexing

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Fundamental Data Compression

Fundamental Data Compression
Automatic Keyphrase Extraction from Chinese Books

SNPD '07 Proceedings of the Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing - Volume 03
Automatic hypertext keyphrase detection

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

From rhetorical structures to document structure: shallow pragmatic analysis for document engineering

Proceedings of the 9th ACM symposium on Document engineering
Automatic Keyphrase Extraction with a Refined Candidate Set

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
SJTULTLAB: Chunk based method for keyphrase extraction

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Efficient keyword extraction for meaningful document perception

Proceedings of the 11th ACM symposium on Document engineering
HSWS: enhancing efficiency of web search engine via semantic web

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Constructing personal knowledge base: automatic key-phrase extraction from multiple-domain web pages

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Using wikipedia anchor text and weighted clustering coefficient to enhance the traditional multi-document summarization

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text

International Journal of Intelligent Information Technologies
A knowledge induced graph-theoretical model for extract and abstract single document summarization

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
PSG: a two-layer graph model for document summarization

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present an automatic Keyphrase extraction technique for English documents of scientific domain. The devised algorithm uses n-gram filtration technique, which filters sophisticated n-grams {1dnd4} along with their weight from the words of input document. To develop n-gram filtration technique, we have used (1) LZ78 data compression based technique, (2) a simple refinement step, (3) A simple Pattern Filtration algorithm and, (4) a term weighting scheme. In term weighting scheme, we have introduced the importance of position of sentence (where given phrase occurs first) in document and position of phrase in sentence for documents of scientific domain (which is literally more organized than other domains). The entire system is based upon statistical observations, simple grammatical facts, heuristics, and lexical information of English language. We remark that the devised system does not require a learning phase. Our experimental results with publically available text dataset, shows that the devised system is comparable with other known algorithms.