Go to:

Projects: Call for Participation

We are looking for collaborators on several subprojects.

Go to: navigation | start of page | end of page

Random Walks in Word Usage Graphs

Finding and testing efficient, distributed and effective computation of Pagerank, e.g. based on http://arxiv.org/pdf/1208.3071v1.pdf on the linguistic data (arXiv.org papers). The goal is to compute word/collocation meanings efficiently from word usage graphs.

contact

Petr Sojka c/o mir.fi.muni.cz && nlp.fi.muni.cz

Go to: navigation | start of page | end of page

Distributed Architecture for Document Processing Pipeline

Finding and testing efficient distributed processing of 1,000,000+ arXiv.org documents for NLP and math-aware preprocessing and indexing, possibly in http://www.rust-lang.org/ based on https://github.com/dginev/CorTeX.

contact

Petr Sojka & Michal Růžička c/o mir.fi.muni.cz && nlp.fi.muni.cz

Go to: navigation | start of page | end of page

Topic Models-based Corpora Visualization and Interface

There are several ways how to interact with paper corpora as arXiv.org, based on topic modeling: http://vis.stanford.edu/papers/termite or http://ajbc.io/projects/papers/ChaneyBlei2012.pdf (http://bit.ly/arxiv-demo). The goal is to design and implement interactive browsing based on topic models computed with award-winning Gensim software.

contact

Petr Sojka c/o mir.fi.muni.cz && nlp.fi.muni.cz

Go to: navigation | start of page | end of page

Maple-based Formulae Canonicalization

To index formulae one needs to pick up canonical representation of formulae: c.f. https://mir.fi.muni.cz/mathml-normalization/. The goals is to efficiently utilize Maple TA to this task (350,000,000+ formulae in arXiv.org).

contact

Petr Sojka & Martin Líška c/o mir.fi.muni.cz && nlp.fi.muni.cz

Go to: navigation | start of page | end of page

Math-aware Information Retrieval Evaluation

Evaluation of Math Information Retrieval systems like MIaS has its specific needs. The goals is to adapt some evaluation system (like Terrier's Evaluation Toolkit) to the needs of MIR.

contact

Petr Sojka & Martin Líška c/o mir.fi.muni.cz && nlp.fi.muni.cz

Go to: navigation | start of page | end of page

Formulae Sketches and Named Entities

There is well known Sketh Engine for word sketches. The goal is to compute formulae sketches from 350,000,000+ formulae of arXiv.org. Based on collocability measures, compute formulae names dictionary.

contact

Petr Sojka c/o mir.fi.muni.cz && nlp.fi.muni.cz

Go to: navigation | start of page | end of page