Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using top-ranking sentences to facilitate effective information access: Book Reviews
Journal of the American Society for Information Science and Technology
Efficient implementation of large-scale multi-structural databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Clustering versus faceted categories for information exploration
Communications of the ACM - Supporting exploratory search
Inverted files for text search engines
ACM Computing Surveys (CSUR)
ACM Transactions on the Web (TWEB)
A method for online analytical processing of text data
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
BlogScope: a system for online analysis of high volume text streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Introduction to Information Retrieval
Introduction to Information Retrieval
Multidimensional content eXploration
Proceedings of the VLDB Endowment
Dynamic faceted search for discovery-driven analysis
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Text Cube: Computing IR Measures for Multidimensional Text Database Analysis
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Direct Discriminative Pattern Mining for Effective Classification
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Document-centric OLAP in the schema-chaos world
BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing
Proceedings of the 15th International Conference on Extending Database Technology
Hi-index | 0.00 |
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.