Team

Project Partners

The main project participants are from Brno University of Technology, Moravian Library in Brno and Masaryk University. Team members from Brno University of Technology are leading experts on automatic document transcription and analysis.

Moravian Library is the main institution driving document digitization in Czech Republic. Team from Masaryk University focuses on media, literature, and discourse analysis.

Project Team

Michal Hradiš

Michal Hradiš (BUT) is a neural networks and AI researcher who aims to transform research results into practical applications. He leads the project, he steers the team towards the project goals and he encourages communication between the technical part of the team and the more user-focused partners.

Petr Žabička

Petr is a library automation expert specializing in digitization, digital libraries, and machine learning. As an associate director at the Moravian Library, he oversees research and development initiatives. He’s currently focused on leveraging machine learning to improve access to digitized documents, including his involvement in the PERO project, aimed at enhancing OCR accuracy.

Kateřina Kirkosová

Analyzing media, literature, and pop culture discourses is my focus. Into semANT, I add a social sciences perspective and bring experiences on how texts are filtered, coded, and mined using its standard quantitative and qualitative research methods.

Jan Kohút

Developing OCR systems that serve to collect large amount of text for our language modeling tasks. A PhD student at Faculty of Information Technology, Brno University of Technology with a main focus on adaptive algorithms for text recognition.

Karel Beneš

Training language models wherever they don’t need to be huge. In semANT, I am dealing with basic processing of data and metadata from the digital library. I have headed the effort leading to TextBite.

Martin Fajčík

Interested in information retrieval tasks beyond standard document retrieval, such as question answering, or fact-checking. In semANT, I am training large language models for historical semantic document processing. I led our research towards transferring English LLMs into Czech language.

Martin Dočekal

I am focusing on processing scientific literature, mainly developing summarization models. In this project, I am participating in the development of LLM, RAQA, and summarization systems.

Martina Dvořáková

Martina, as Project Coordinator at Moravian Library, oversees various projects, including “semANT”. Previously, she contributed to the “PERO” project. Martina manages project communication and collaborates with partners. With a background in history, she plays a key role in the Moravian library’s digital humanities research team, leveraging her expertise in digital heritage.

Filip Jebavý

Filip specializes in research in AI, Music Information Retrieval and Digital Humanities. His research focuses primarily on the analytical capabilities of artificial neural networks. He currently works as the head of the Digital Document Management Department at the Moravian Library. Here he is involved in several projects in the field of digital humanities, especially in the area of machine learning. Together with his team, he also develops and maintains several tools to improve and facilitate the digitalization or management of digital documents.

Boris Lehečka

Researcher at the Moravian Library in Brno and freelancer. Areas of expertise: scholarly digital editions; lexicography; research infrastructures; digital humanities. In this project, I focus on the parts related to the Czech language and user experience from the researchers’ perspective.

Martin Kišš

In semANT, I am involved in creating OCR models and page layout recognition models. Besides that, I focus on training OCR on unlabelled data and I like to compete in various document-related challenges.