MREC — Mathematical REtrieval Collection

MREC is based on arXMLiv — a project of Prof. Dr. Michael Kohlhase's group at Jacobs University Bremen. arXMLiv documents came from arXiv.org but are translated to XML. These documents cover different scientific areas — Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.

However, MREC is not an exact copy of the arXMLiv content.

MREC contains just a subset of the arXMLiv — we use only XML documents marked in the arXMLiv as converted successfully (with no problem or warnings) or completed with just ‘missing macro’ errors.

Moreover, there were several modifications of the files that were in our point of view necessary to make the documents well-formed and valid. These modifications include removing unnecessary attributes, namespace proxies, <div> elements nested in <span> elements and so on.

MREC consists of well-formed XHTML documents. MathML, a W3C standard, is used for representation of mathematical formulae. Formulae are canonicalized into Canonical MathML using UMCL.

MREC could be used for different purposes. For math search, we use two different search engines (MIaS and Solr) and compare them. MREC is also used in similarity computations using gensim.

  • MREC2011.3
    • verison 2011.3.324
    • BZip2 compressed TAR, 3.8 MiB
    • more than 324,000 XHTML documents
    • without semantic
  • MREC2011.4
    • verison 2011.4.439
    • BZip2 compressed TAR, 15 GiB
    • more than 439,000 XHTML documents
Cite as


LÍŠKA, Martin, Petr SOJKA, Michal RŮŽIČKA and Peter MRAVEC. Web Interface and Collection for Mathematical Retrieval : WebMIaS and MREC. In Petr Sojka, Thierry Bouche. DML 2011: Towards a Digital Mathematics Library. Brno: Masaryk University, 2011. p. 77–84. ISBN 978-80-210-5542-1.


     author = "Martin L\'{\i}\v{s}ka and Petr Sojka
               and Michal R\r{u}\v{z}i\v{c}ka and Petr Mravec",
      title = "{Web Interface and Collection for Mathematical Retrieval:
                WebMIaS and MREC}",
     editor = "Petr Sojka and Thierry Bouche",
  booktitle = "{Towards a Digital Mathematics Library.}"
    address = "{Bertinoro, Italy}",
       year = 2011,
      month = Jul,
      pages = "77--84",
  publisher = "{Masaryk University}",
       isbn = "978-80-210-5542-1",
        url = {http://hdl.handle.net/10338.dmlcz/702604},
Relevant projects

