Projects: Call for Participation
We are looking for collaborators on several subprojects.
Go to: navigation | start of page | end of page
Random Walks in Word Usage Graphs
Finding and testing efficient, distributed and effective computation of Pagerank, e.g. based on http://arxiv.org/pdf/1208.3071v1.pdf on the linguistic data (arXiv.org papers). The goal is to compute word/collocation meanings efficiently from word usage graphs.
contact
Petr Sojka c/o mir.fi.muni.cz && nlp.fi.muni.cz
Go to: navigation | start of page | end of page
Distributed Architecture for Document Processing Pipeline
Finding and testing efficient distributed processing of 1,000,000+ arXiv.org documents for NLP and math-aware preprocessing and indexing, possibly in http://www.rust-lang.org/ based on https://github.com/dginev/CorTeX.
contact
Petr Sojka & Michal Růžička c/o mir.fi.muni.cz && nlp.fi.muni.cz
Go to: navigation | start of page | end of page
Topic Models-based Corpora Visualization and Interface
There are several ways how to interact with paper corpora as arXiv.org, based on topic modeling: http://vis.stanford.edu/papers/termite or http://ajbc.io/projects/papers/ChaneyBlei2012.pdf (http://bit.ly/arxiv-demo). The goal is to design and implement interactive browsing based on topic models computed with award-winning Gensim software.
contact
Petr Sojka c/o mir.fi.muni.cz && nlp.fi.muni.cz
Go to: navigation | start of page | end of page
Maple-based Formulae Canonicalization
To index formulae one needs to pick up canonical representation of formulae: c.f. https://mir.fi.muni.cz/mathml-normalization/. The goals is to efficiently utilize Maple TA to this task (350,000,000+ formulae in arXiv.org).
contact
Petr Sojka & Martin Líška c/o mir.fi.muni.cz && nlp.fi.muni.cz
Go to: navigation | start of page | end of page
Math-aware Information Retrieval Evaluation
Evaluation of Math Information Retrieval systems like MIaS has its specific needs. The goals is to adapt some evaluation system (like Terrier's Evaluation Toolkit) to the needs of MIR.
contact
Petr Sojka & Martin Líška c/o mir.fi.muni.cz && nlp.fi.muni.cz
Go to: navigation | start of page | end of page
Formulae Sketches and Named Entities
There is well known Sketh Engine for word sketches. The goal is to compute formulae sketches from 350,000,000+ formulae of arXiv.org. Based on collocability measures, compute formulae names dictionary.
contact
Petr Sojka c/o mir.fi.muni.cz && nlp.fi.muni.cz