The aim of the Maths Information Retrieval research group at Masaryk University (MIR@MU) is to develop system enabling readers to cope with maths in digital libraries.
Přejít: navigation | start of page | end of page
News
2022-02-28: The paper of Vítek Novotný, Michal Štefánik, Eniafe Festus Ayetiran, Petr Sojka, and Radim Řehůřek has been published in the Journal of Universal Computer Science. This paper is a part of the Evaluation of Extended Word Embeddings project.
2021-09-07: The paper of Michal Štefánik, Vítek Novotný, and Petr Sojka has been accepted for publication at WMT 2021 (co-located with EMNLP 2021) and will be presented as a poster. This paper is a part of the Evaluation of Extended Word Embeddings project. See you online!
2021-07-27: The paper of Vítek Novotný, Eniafe Festus Ayetiran, Dalibor Bačovský, Dávid Lupták, Michal Štefánik, and Petr Sojka has been accepted for publication at RANLP 2021. This paper is a part of the Evaluation of Extended Word Embeddings project. See you online!
2021-07-11: Vítek Novotný has talked about his research of Interpretable Document Representations for Fast and Accurate Retrieval of Mathematical Information at the prestigious SIGIR 2021 Doctoral Consortium.
2021-07-06: Petr Mička has joined our MIR@MU research team.
2021-03-08: The paper of Eniafe Festus Ayetiran, Petr Sojka, and Vítek Novotný has been accepted for publication in the impacted Knowledge-Based Systems journal. This paper is a part of the Evaluation of Extended Word Embeddings project.
2021-01-28: Dalibor Bačovský, Mikuláš Bankovič, and Martin Geletka have joined our MIR@MU research team.
2020-07-07: Greetings from RAMIRA 2020 in Vápenná, Jeseník, Czech Republic. Even in the summer, we can’t help but research new and exciting ways for advancing the math information retrieval.
2020-02-20: Michal Štefánik has joined our MIR@MU research team.
2019-09-20: Eniafe Festus Ayetiran has joined our MIR@MU research team.
2019-04-08: During 13:00–16:00 CET, we installed two 8TiB WD Ultrastar DC SN200 HH-HL AIC NVMe SSD disks to MIR, theoretically reaching up to 12 GiB/s sequential read performance and up to 4 GiB/s sequential write performance in RAID 0. We also upgraded MIR from Debian 8 to Debian 9 and installed the Linux 4.19 kernel after almost two years of uptime.
2019-04-04: We have just submitted our GAČR 2020 proposal for the “Continuous and Discrete (Concrete) Language Representations” three-year grant project.
2019-01-07: Our posters were accepted to the ML Prague 2019 conference. See you in Prague!
2018-10-24: Petr Sojka and Vítek Novotný present our papers MIaS: Math-Aware Retrieval in Digital Mathematical Libraries and Implementation Notes for the Soft Cosine Measure at the CIKM 2018 ACM conference in Torino, Italy.
2018-08-07: Our short paper MIaS: Math-Aware Retrieval in Digital Mathematical Libraries (postprint) and Implementation Notes for the Soft Cosine Measure (postprint) were accepted to the CIKM 2018 ACM conference. See you in Torino!
2018-05-23: Vítek Novotný placed third with our paper Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines at the PhD Poster Session (FI MU).
All news...
Přejít: navigation | start of page | end of page
Subprojects
Projects
Math Indexer and Searcher (MIaS)
MIaS is a maths-aware full-text based search engine by MIR@MU. MIaS is based on the state-of-the-art system Apache Lucene and accompanied by a web user interface WebMIaS.
Web User Interface for MIaS (WebMIaS)
WebMIaS is a web user interface enabling end users to use MIaS in a user-friendly way.
MathML Normalization (CanonMath)
Advanced MathML search engine working with MathML needs a tool for picking canonical representant of different forms of MathML coding of semantically equivalent formulae. We are developing such canonicalizer (primarily for MIaS).
Mathematical REtrieval Collection (MREC)
Integral part of search engine development is evaluation and testing. Among other data sets we prepared MREC based on arXMLiv — a project of Prof. Dr. Michael Kohlhase's group at Jacobs University Bremen.
Gensim – Similarity of Documents (DocSim)
Gensim is an open-source general-purpose software for scalable topic modelling. We have developed and use this tool for computing similarities between maths documents.
EuDML
MIR@MU team participated on the project of European Digital Mathematical Library (EuDML).
DML-CZ
MIR@MU team participated on the project of Czech Digital Mathematics Library (DML-CZ).
PdfToTextViaOCR
An open-source tool for image-based-PDF to text conversion developed as part of EuDML workflow.
PdfJbIm
An open-source tool for optimization and re-compression of PDF documents using standard JBIG2 compression developed as part of EuDML workflow.
Evaluation of Extended Word Embeddings
Word embeddings of shallow neural networks have a number of extensions that give strong results on intrinsic tasks (word analogy), but weren't extensively evaluated on multilingual extrinsic tasks (text classification, language modeling, information retrieval) that correspond to real-word end tasks. The goal of this project is to prepare a set of tasks for the evaluation of word embeddings on multilingual extrinsic tasks.