INFTY: an integrated OCR system for mathematical documents
Proceedings of the 2003 ACM symposium on Document engineering
MathFind: a math-aware search engine
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Methods of Relevance Ranking and Hit-content Generation in Math Search
Calculemus '07 / MKM '07 Proceedings of the 14th symposium on Towards Mechanized Mathematical Assistants: 6th International Conference
Roles of math search in mathematics
MKM'06 Proceedings of the 5th international conference on Mathematical Knowledge Management
Canonical MathML to simplify conversion of MathML to braille mathematical notations
ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs
Project EuDML: a first year demonstration
MKM'11 Proceedings of the 18th Calculemus and 10th international conference on Intelligent computer mathematics
The art of mathematics retrieval
Proceedings of the 11th ACM symposium on Document engineering
A query language for formal mathematical libraries
CICM'12 Proceedings of the 11th international conference on Intelligent Computer Mathematics
MathWebSearch 0.5: scaling an open formula search engine
CICM'12 Proceedings of the 11th international conference on Intelligent Computer Mathematics
Exploiting semantic annotations in math information retrieval
Proceedings of the fifth workshop on Exploiting semantic annotations in information retrieval
WikiMirs: a mathematical information retrieval system for wikipedia
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on PresentationMathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene. Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension.