Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
On the application of syntactic methodologies in automatic text analysis
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Interpreting nominal compounds for information retrieval
Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Progress in the application of natural language processing to information retrieval tasks
The Computer Journal - Special issue on information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Towards multidocument summarization by reformulation: progress and prospects
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Disambiguation of proper names in text
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
MITRE: description of the Alembic system used for MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
Unsupervised and supervised clustering for topic tracking
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Multiple related document summary and navigation using concept hierarchies for mobile clients
Proceedings of the 2002 ACM symposium on Applied computing
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Topic Detection in the news domain
ISICT '04 Proceedings of the 2004 international symposium on Information and communication technologies
First story detection using a composite document representation
HLT '01 Proceedings of the first international conference on Human language technology research
Using syntactic analysis to increase efficiency in visualizing text collections
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic construction of multifaceted browsing interfaces
Proceedings of the 14th ACM international conference on Information and knowledge management
Thread detection in dynamic text message streams
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Journal of Biomedical Informatics
Efficient summarization-aware search for online news articles
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Tracking and summarizing news on a daily basis with Columbia's Newsblaster
HLT '02 Proceedings of the second international conference on Human Language Technology Research
The Evaluation Measure of Text Clustering for the Variable Number of Clusters
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Table Based Single Pass Algorithm for Clustering News Articles in NewsPage.com
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Towards the Automatic Construction of Conceptual Taxonomies
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Automatic discovery of topics and acoustic morphemes from speech
Computer Speech and Language
An adaptive threshold framework for event detection using HMM-based life profiles
ACM Transactions on Information Systems (TOIS)
Inferring activity time in news through event modeling
HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Columbia Newsblaster: multilingual news summarization on the web
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Learning similarity metrics for event identification in social media
Proceedings of the third ACM international conference on Web search and data mining
New event detection and topic tracking in Turkish
Journal of the American Society for Information Science and Technology
A preliminary study on multiple documents access via mobile devices
HSI'03 Proceedings of the 2nd international conference on Human.society@internet
Which clustering do you want? inducing your ideal clustering with minimal feedback
Journal of Artificial Intelligence Research
Research of fast SOM clustering for text information
Expert Systems with Applications: An International Journal
WikiTopics: what is popular on Wikipedia and why
WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
Identifying content for planned events across social media sites
Proceedings of the fifth ACM international conference on Web search and data mining
Text categorization based on subtopic clusters
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Dynamic pattern mining: an incremental data clustering approach
Journal on Data Semantics II
Indices of novelty for emerging topic detection
Information Processing and Management: an International Journal
Lydia: a system for large-scale news analysis
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
On-line single-pass clustering based on diffusion maps
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Semi-Automatic Ontology Construction by Exploiting Functional Dependencies and Association Rules
International Journal on Semantic Web & Information Systems
Automated crime report analysis and classification for e-government and decision support
Proceedings of the 14th Annual International Conference on Digital Government Research
Hi-index | 0.00 |
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, and single-pass) and two linguistically motivated text features (noun phrase heads and proper names) in the context of document clustering. A statistical model for combining similarity information from multiple sources is described and applied to DARPA's Topic Detection and Tracking phase 2 (TDT2) data. This model, based on log-linear regression, alleviates the need for extensive search in order to determine optimal weights for combining input features. Through an extensive series of experiments with more than 40,000 documents from multiple news sources and modalities, we establish that both the choice of clustering algorithm and the introduction of the additional features have an impact on clustering performance. We apply our optimal combination of features to the TDT2 test data, obtaining partitions of the documents that compare favorably with the results obtained by participants in the official TDT2 competition.