Making large-scale support vector machine learning practical
Advances in kernel methods
Algorithmics and applications of tree and graph searching
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modern Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting support vector machines for text classification through parameter-free threshold relaxation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
The complexity of mining maximal frequent itemsets and maximal frequent patterns
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Web search efficiency via a locality based static pruning method
WWW '05 Proceedings of the 14th international conference on World Wide Web
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A document-centric approach to static index pruning in text retrieval systems
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Fast Kernel Classifiers with Online and Active Learning
The Journal of Machine Learning Research
Extraction and search of chemical formulae in text documents on the web
Proceedings of the 16th international conference on World Wide Web
Topic segmentation with shared topic detection and alignment of multiple documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Mining, indexing, and searching for textual chemical molecule information on the web
Proceedings of the 17th international conference on World Wide Web
Detection of IUPAC and IUPAC-like chemical names
Bioinformatics
Introduction to Information Retrieval
Introduction to Information Retrieval
Annotation of chemical named entities
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Semi-supervised sequence modeling with syntactic topic models
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Independent informative subgraph mining for graph information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Learning to rank graphs for online similar graph search
Proceedings of the 18th ACM conference on Information and knowledge management
High-Throughput identification of chemistry in life science texts
CompLife'06 Proceedings of the Second international conference on Computational Life Sciences
Hi-index | 0.00 |
End-users utilize chemical search engines to search for chemical formulae and chemical names. Chemical search engines identify and index chemical formulae and chemical names appearing in text documents to support efficient search and retrieval in the future. Identifying chemical formulae and chemical names in text automatically has been a hard problem that has met with varying degrees of success in the past. We propose algorithms for chemical formula and chemical name tagging using Conditional Random Fields (CRFs) and Support Vector Machines (SVMs) that achieve higher accuracy than existing (published) methods. After chemical entities have been identified in text documents, they must be indexed. In order to support user-provided search queries that require a partial match between the chemical name segment used as a keyword or a partial chemical formula, all possible (or a significant number of) subformulae of formulae that appear in any document and all possible subterms (e.g., “methyl”) of chemical names (e.g., “methylethyl ketone”) must be indexed. Indexing all possible subformulae and subterms results in an exponential increase in the storage and memory requirements as well as the time taken to process the indices. We propose techniques to prune the indices significantly without reducing the quality of the returned results significantly. Finally, we propose multiple query semantics to allow users to pose different types of partial search queries for chemical entities. We demonstrate empirically that our search engines improve the relevance of the returned results for search queries involving chemical entities.