Go to:

DocSim: Semantic Similarity of Text Documents based on Gensim

For EuDML, concepted and motivated for use in DML-CZ we provide Gensim as a library for computing similarities between plain text documents. It is an open-source, general-purpose software for scalable topic modelling, based on the Vector Space Model of document representation.

The award winning Gensim system was developed by Radim Řehůřek. The system is widely used and cited in digital libraries, Content Management Systems, teaching of machine learning methods etc.

  • EuDML
    • Gensim as used in the demo version of the European Digital Mathematics Library (EuDML).
  • DML-CZ
    • Gensim as used in the Czech Digital Mathematics Library (DML-CZ).
Go to: navigation | start of page | end of page

Cite as

Text

ŘEHŮŘEK, Radim and Petr SOJKA. Software Framework for Topic Modelling with Large Corpora. In Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. Valletta, Malta: University of Malta, 2010. pp. 46&endash;50. ISBN 2-9517408-6-7.

BibTeX

@inproceedings{ismu:884893,
     author = "Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka",
      title = "{Software Framework for Topic Modelling with Large Corpora}",
  booktitle = "{Proceedings of the LREC 2010 Workshop on New Challenges for 
		NLP Frameworks}",
  publisher = {ELRA},
    address = {Valletta, Malta},
       year = 2010,
      month = May,
       isbn = "2-9517408-6-7",
      pages = "45--50",
        url = {http://is.muni.cz/publication/884893/en},
}
		
Go to: navigation | start of page | end of page

Selected publications

Go to: navigation | start of page | end of page

Relevant projects

Go to: navigation | start of page | end of page