MIaS
MIaS (Math Indexer and Searcher) is a maths-aware full-text based search engine. It is based on the state-of-the-art system Apache Lucene, however, its maths processing capabilities are standalone and can be easily integrated into any Lucene/Solr based system, as in EuDML search service. MIaS processes documents containing mathematics encoded in MathML format and in several steps allowing formulae similarity search transforms problem of matching XML structures to regular full-text searching. Principles are described in our publications. Currently, our MREC and other collections such as NTCIR Math Task data set are being used to evaluate the system.
We are working hard to deliver much refactored and updated version 2 of the system which will be more precise and efficient. It supports both Presentation and Content MathML for indexing and searching and uses our own MathML Canonicalizer for better precision, it will allow access through web services and support OpenSearch standard for better accessibility and many more...
Go to: navigation | start of page | end of page
NTCIR Math Task Evaluation Competitions
Last year MIR@MU group joined with MIaS the first ever official math information retrieval evaluation task roofed by an international IR evaluation initiative NTCIR.
MIaS was sucessful with then development version. This year we are going to compete with other systems with improved current MIaS again in NTCIR-12 Math Task.
Go to: navigation | start of page | end of page
Cite as
Text
SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011. p. 57–60. ISBN 978-1-4503-0863-2. doi:10.1145/2034691.2034703.
BibTeX
@inproceedings{doi:10.1145:2034691.2034703, author = "Petr Sojka and Martin L{\'\i}{\v s}ka", title = "{The Art of Mathematics Retrieval}", booktitle = "{Proceedings of the ACM Conference on Document Engineering, DocEng 2011}", publisher = "{Association of Computing Machinery}", address = "{Mountain View, CA}", year = 2011, month = Sep, isbn = "978-1-4503-0863-2", pages = "57--60", url = {http://doi.acm.org/10.1145/2034691.2034703}, doi = {10.1145/2034691.2034703}, abstract = {The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-of-the-art system Apache Lucene. Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene.}, }
Go to: navigation | start of page | end of page
Selected Publications
- SOJKA, Petr, Michal RŮŽIČKA and Vít NOVOTNÝ. MIaS: Math-Aware Retrieval in Digital Mathematical Libraries. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). Torino, Italy: Association for Computing Machinery, 2018. 4 pp. ISBN 978-1-4503-6014-2. doi:10.1145/3269206.3269233.
- RŮŽIČKA, Michal, Petr SOJKA a Martin LÍŠKA. Math Indexer and Searcher under the Hood: Fine-Tuning Query Expansion and Unification Strategies. In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies. Tokyo: National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430 Japan, 2016. 7 pp.
- LÍŠKA, Martin, Petr SOJKA and Michal RŮŽIČKA. Combining Text and Formula Queries in Math Information Retrieval: Evaluation of Query Results Merging Strategies. In Davood Rafiei, Katsumi Tanaka. NWSearch '15: Proceedings of the First International Workshop on Novel Web Search Interfaces and Systems. New York, NY, USA: ACM, 2015. p. 7-9, 3 pp. ISBN 978-1-4503-3789-2. doi:10.1145/2810355.2810359.
- LÍŠKA, Martin. Enhancing Mathematics Information Retrieval. In Proceedings of The 38th International ACM SIGIR Conference. 2015. 1 pp.
- RŮŽIČKA, Michal, Petr SOJKA and Martin LÍŠKA. Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy. In Noriko Kando, Hideo Joho, Kazuaki Kishida. Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. Tokyo: National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430 Japan, 2014. p. 127-134, 8 pp. ISBN 978-4-86049-065-2.
- Martin Líška. Evaluation of Mathematics Retrieval, Jan. 2013. Master Thesis, Masaryk University, Brno, Faculty of Informatics (advisor: Petr Sojka)
- LÍŠKA, Martin, Petr SOJKA and Michal RŮŽIČKA. Similarity Search for Mathematics: Masaryk University team at the NTCIR-10 Math Task. In Noriko Kando, Kazuaki Kishida. Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies. Tokyo: National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430 Japan, 2013. s. 686-691, 6 s. ISBN 978-4-86049-062-1.
- Martin Líška. Vyhledávání v matematickém textu (in Slovak, Searching Mathematical Texts). Bachelor thesis, Faculty of Informatics, Advisor: Petr Sojka. 2010. Masaryk University, Brno
- SOJKA, Petr and Martin LÍŠKA. Indexing and Searching Mathematics in Digital Libraries -- Architecture, Design and Scalability Issues. In James H. Davenport, William M. Farmer, Josef Urban, Florian Rabe. Intelligent Computer Mathematics Lecture Notes in Computer Science, 2011, Volume 6824/2011. Berlin / Heidelberg: Springer, 2011. s. 228--243, 15 s. ISBN 978-3-642-22672-4. doi:10.1007/978-3-642-22673-1_16.
- SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011. s. 57--60, 4 s. ISBN 978-1-4503-0863-2. doi:10.1145/2034691.2034703.