A maximum entropy approach to natural language processing
Computational Linguistics
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithmics and applications of tree and graph searching
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
Boosting support vector machines for text classification through parameter-free threshold relaxation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Semantic web applications to e-science in silico experiments
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
Knowledge modeling and its application in life sciences: a tale of two ontologies
Proceedings of the 15th international conference on World Wide Web
Feature-based similarity search in graph structures
ACM Transactions on Database Systems (TODS)
Fast Kernel Classifiers with Online and Active Learning
The Journal of Machine Learning Research
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
ChemXSeer: a digital library and data repository for chemical kinetics
Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Mining, indexing, and searching for textual chemical molecule information on the web
Proceedings of the 17th international conference on World Wide Web
A proposal for chemical information retrieval evaluation
Proceedings of the 1st ACM workshop on Patent information retrieval
Annotation of chemical named entities
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Independent informative subgraph mining for graph information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Learning to rank graphs for online similar graph search
Proceedings of the 18th ACM conference on Information and knowledge management
Mixing statistical and symbolic approaches for chemical names recognition
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Exposing the hidden web for chemical digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents
ACM Transactions on Information Systems (TOIS)
Taking chemistry to the task: personalized queries for chemical digital libraries
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Hi-index | 0.00 |
Often scientists seek to search for articles on the Web related to a particular chemical. When a scientist searches for a chemical formula using a search engine today, she gets articles where the exact keyword string expressing the chemical formula is found. Searching for the exact occurrence of keywords during searching results in two problems for this domain: a) if the author searches for CH4 and the article has H4C, the article is not returned, and b) ambiguous searches like "He" return all documents where Helium is mentioned as well as documents where the pronoun "he" occurs. To remedy these deficiencies, we propose a chemical formula search engine. To build a chemical formula search engine, we must solve the following problems: 1) extract chemical formulae from text documents, 2) index chemical formulae, and 3) designranking functions for the chemical formulae. Furthermore, query models are introduced for formula search, and for each a scoring scheme based on features of partial formulae is proposed tomeasure the relevance of chemical formulae and queries. We evaluate algorithms for identifying chemical formulae in documents using classification methods based on Support Vector Machines(SVM), and a probabilistic model based on conditional random fields (CRF). Different methods for SVM and CRF to tune the trade-off between recall and precision forim balanced data are proposed to improve the overall performance. A feature selection method based on frequency and discrimination isused to remove uninformative and redundant features. Experiments show that our approaches to chemical formula extraction work well, especially after trade-off tuning. The results also demonstrate that feature selection can reduce the index size without changing ranked query results much.