I am a full-time researcher specializing in the design and evaluation of language resources and technologies, with a focus on their application in corpus linguistic research.
A common thread in my work is the use of state-of-the-art resources and tools for data-driven exploration of how language functions in different communicative settings, such as written, spoken and AI-generated language. I also apply this expertise in the development and evaluation of language resources and technologies of various kinds.
I am currently affiliated with the University of Ljubljana (Centre for Language Resources and Technologies) and Centre of Excellence for AI in Digital Humanities) and collaborate closely with CLARIN.SI at the Jožef Stefan Institute (AI Lab).
For more info, see my full CV here, or view my profiles on Google Scholar and ResearchGate.
- Data-driven language analysis and modeling
- Language resources and technologies for Slovene
- Evaluation and analysis of large language models
- LLM4DH: Large Language Models for Digital Humanities (ARIS, 2024‒2027) – T2.3: Advanced grammatical analysis of multilingual corpora
- AI4DH: Centre of Excellence in Artificial Intelligence for Digital Humanities (Horizon Europe, 2025‒2030) – WP2: Infrastructure and Research Challenges
- UniDive: Universality, diversity and idiosyncrasy in language technology (COST Action, 2022‒2026) – WG1: Corpus Annotation
- SPOT: Treebank-Driven Approach to the Study of Spoken Slovenian (ARIS, 2022‒2025, Principal Investigator) {% comment %}
- SLOKIT: CLARIN.SI tool for corpus data analysis and summarization (2022-2023)
- DSDE: Development of Slovene in a Digital Environment (2020-2023)
- ELEXIS: European Lexicographic Infrastructure (2020-2023)
- SLED: Monitor Corpus for Slovene and Related Resources (2021-2022)
- NSSS: New grammar of contemporary standard Slovene (2017-2020)
- Language Technology Seminars for Teachers (2013-2014) {% endcomment %}
For a full list, please see the SICRIS database.
- May 2026: Co-organizing the Universal Dependencies Workshop at LREC-COLING 2026, and contributing to papers on ROG multi-layer spoken corpus, UniDive annotation tool survey, and the ADMIRE and UNER multilingual benchmarks.
- March 2026: We presented DELTA, a toolkit for multi-level and multi-dimensional measurement of linguistic diversity in parsed corpora, at EACL 2026.
- February 2026: Counting Trees is out in Corpus Linguistics and Linguistic Theory, introducing a novel STARK-based method for bottom-up analysis of syntactic variation across corpora.
- August 2025: Honoured and proud to have chaired the Organizing Committee of SyntaxFest 2025 in Ljubljana, which brought together five workshops (IWPT, UDW, DepLing, TLT, QUASY), two UniDive pre-conference events, and more than 80 presentations on empirical syntactic analysis and parsing.
- June 2025: It's a wrap! My postdoc SPOT project officially comes to an end, leaving behind new data, tools, and methods for studying speech through syntactically parsed corpora.
- February 2025: Kick-off of the AI4DH Centre of Excellence, where we’re joining forces across disciplines to help researchers in the humanities and social sciences integrate AI into their work through tailored infrastructure, training and collaboration.
- November 2024: Err, well ... We’ve just released a bigger, better, and more polished version of the SST UD treebank, to be used in linguistic and NLP research on Slovenian speech. Embedded in ROG, it also features prosody, disfluency and dialogue act annotations.