Algorithmics and applications of tree and graph searching
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The complexity of mining maximal frequent itemsets and maximal frequent patterns
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Web search efficiency via a locality based static pruning method
WWW '05 Proceedings of the 14th international conference on World Wide Web
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Feature-based similarity search in graph structures
ACM Transactions on Database Systems (TODS)
Extraction and search of chemical formulae in text documents on the web
Proceedings of the 16th international conference on World Wide Web
Answering relationship queries on the web
Proceedings of the 16th international conference on World Wide Web
Dynamic personalized pagerank in entity-relation graphs
Proceedings of the 16th international conference on World Wide Web
Topic segmentation with shared topic detection and alignment of multiple documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised sequence modeling with syntactic topic models
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Independent informative subgraph mining for graph information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Learning to rank graphs for online similar graph search
Proceedings of the 18th ACM conference on Information and knowledge management
Exposing the hidden web for chemical digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
oreChem ChemXSeer: a semantic digital library for chemistry
Proceedings of the 10th annual joint conference on Digital libraries
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents
ACM Transactions on Information Systems (TOIS)
Taking chemistry to the task: personalized queries for chemical digital libraries
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Quantifying the impact of concept recognition on biomedical information retrieval
Information Processing and Management: an International Journal
Effective query generation and postprocessing strategies for prior art patent search
Journal of the American Society for Information Science and Technology
Learning to extract chemical names based on random text generation and incomplete dictionary
Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Chemical Name Extraction Based on Automatic Training Data Generation and Rich Feature Set
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Current search engines do not support user searches for chemical entities (chemical names and formulae) beyond simple keyword searches. Usually a chemical molecule can be represented in multiple textual ways. A simple keyword search would retrieve only the exact match and not the others. We show how to build a search engine that enables searches for chemical entities and demonstrate empirically that it improves the relevance of returned documents. Our search engine first extracts chemical entities from text, performs novel indexing suitable for chemical names and formulae, and supports different query models that a scientist may require. We propose a model of hierarchical conditional random fields for chemical formula tagging that considers long-term dependencies at the sentence level. To substring searches of chemical names, a search engine must index substrings of chemical names. Indexing all possible sub-sequences is not feasible in practice. We propose an algorithm for independent frequent subsequence mining to discover sub-terms of chemical names with their probabilities. We then propose an unsupervised hierarchical text segmentation (HTS) method to represent a sequence with a tree structure based on discovered independent frequent subsequences, so that sub-terms on the HTS tree should be indexed. Query models with corresponding ranking functions are introduced for chemical name searches. Experiments show that our approaches to chemical entity tagging perform well. Furthermore, we show that index pruning can reduce the index size and query time without changing the returned ranked results significantly. Finally, experiments show that our approaches out-perform traditional methods for document search with ambiguous chemical terms.