About the Project
The aim of the Maths Information Retrieval research group at Masaryk University (MIR@MU) is to develop system enabling readers to cope with maths in digital libraries.
The increasing amount of data stored in digital libraries is making it increasingly difficult for the reader to find relevant contents. Users are accustomed to looking for answers to their questions through search engines. On the current Internet, a very simple one field search interface is de facto standard. It is especially thanks to famous search services like Google.
However, such a simple search based on text keywords is not appropriate or sufficient for mathematical contents. Mathematical expressions with the same meaning can be written in many ways by the author and consequently encoded in many ways in the computer system.
Moreover, the authors of mathematical papers are usually preparing their documents for print. As such, tools are routinely used by the authors to encode the appearance of the formulae and not their meaning. Even though there are methods of doing so, common authors derive no direct additional value from semantically annotating their papers, and therefore, they do not. There is no reason to believe that it will change in the future, which makes it our responsibility to process the real documents.
Thus, it is not easy to design and implement a mathematical aware search engine and integrate it into a digital mathematics library. Among others, it is necessary to cope with these issues:
- Recognition and proper processing of mathematical symbols in mathematical content and queries.
- Capturing and indexing mathematical structures.
- Providing a math-appropriate query language and user interface that enable users to express their information needs, which often involve math symbols and structures.
- Developing and integrating techniques for taking into account mathematical synonyms and equivalences – at least some of the more common ones such as commutativity and associativity based equivalences.