Movie segmentation into scenes and chapters using locally weighted bag of visual words
Proceedings of the ACM International Conference on Image and Video Retrieval
Language pyramid and multi-scale text analysis
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Local space-time smoothing for version controlled documents
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Local histograms of character N-grams for authorship attribution
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Modeling coherence in ESOL learner texts
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Sentiment classification with supervised sequence embedding
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Multimodal late fusion bag of features applied to scene detection
Proceedings of the 19th Brazilian symposium on Multimedia and the web
Persistent homology: an introduction and a new text representation for natural language processing
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its n-gram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or n-grams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks.