Representation and learning in information retrieval
Representation and learning in information retrieval
Projections for efficient document clustering
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
The use of bigrams to enhance text categorization
Information Processing and Management: an International Journal
Document Ranking and the Vector-Space Model
IEEE Software
Beyond Eigenfaces: Probabilistic Matching for Face Recognition
FG '98 Proceedings of the 3rd. International Conference on Face & Gesture Recognition
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A “stereo” document representation for textual information retrieval
Journal of the American Society for Information Science and Technology
Using ontology-based approaches to representing speech transcripts for automated speech scoring
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Hi-index | 0.89 |
We have developed a new effective probabilistic classifier for document classification by introducing the concept of differential document vectors and DLSI (differential latent semantic indexing) spaces. A combined use of the projections on and the distances to the DLSI spaces introduced from the differential document vectors improves the adaptability of the LSI (latent semantic indexing) method by capturing unique characteristics of documents. Using the intra- and extra-document statistics, both a simple posteriori calculation on a small example and an experiment on a large Reuters-21578 database demonstrate the advantage of the DLSI space-based probabilistic classifier over the LSI space-based classifier in classification performance.