Fast discovery of association rules
Advances in knowledge discovery and data mining
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Extracting unstructured data from template generated web documents
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
Discovering Compound and Proper Nouns
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Discovering Synonyms Based on Frequent Termsets
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Authorship attribution using word sequences
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
A text mining approach for definition question answering
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
MFSRank: an unsupervised method to extract keyphrases using semantic information
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
On the evaluation and improvement of Arabic WordNet coverage and usability
Language Resources and Evaluation
Hi-index | 0.00 |
We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than 驴 documents, in which 驴 is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent sequence exists that contains this sequence. The words of a sequence do not have to appear in text consecutively.In this paper, we describe briefly the method for finding all maximal frequent word sequences in text and then extend the method for extracting generalized sequences from annotated texts, where each word has a set of additional, e.g. morphological, features attached to it. We aim at discovering patterns which preserve as many features as possible such that the frequency of the pattern still exceeds the frequency threshold given.