kajad.github.io/index.md at main · kajad/kajad.github.io

About me

I am a full-time researcher specializing in the design and evaluation of language resources and technologies, with a focus on their application in corpus linguistic research.

A common thread in my work is the use of state-of-the-art resources and tools for data-driven exploration of how language functions in different communicative settings, such as written, spoken and AI-generated language. I also apply this expertise in the development and evaluation of language resources and technologies of various kinds.

I am currently affiliated with the University of Ljubljana (Centre for Language Resources and Technologies) and Centre of Excellence for AI in Digital Humanities) and collaborate closely with CLARIN.SI at the Jožef Stefan Institute (AI Lab).

For more info, see my full CV here, or view my profiles on Google Scholar and ResearchGate.

Research areas

Data-driven language analysis and modeling
Language resources and technologies for Slovene
Evaluation and analysis of large language models

Current projects

LLM4DH: Large Language Models for Digital Humanities (ARIS, 2024‒2027) – T2.3: Advanced grammatical analysis of multilingual corpora
AI4DH: Centre of Excellence in Artificial Intelligence for Digital Humanities (Horizon Europe, 2025‒2030) – WP2: Infrastructure and Research Challenges
UniDive: Universality, diversity and idiosyncrasy in language technology (COST Action, 2022‒2026) – WG1: Corpus Annotation

Selected past projects

SPOT: Treebank-Driven Approach to the Study of Spoken Slovenian (ARIS, 2022‒2025, Principal Investigator) {% comment %}
SLOKIT: CLARIN.SI tool for corpus data analysis and summarization (2022-2023)
DSDE: Development of Slovene in a Digital Environment (2020-2023)
ELEXIS: European Lexicographic Infrastructure (2020-2023)
SLED: Monitor Corpus for Slovene and Related Resources (2021-2022)
NSSS: New grammar of contemporary standard Slovene (2017-2020)
Language Technology Seminars for Teachers (2013-2014) {% endcomment %}

Publications

For a full list, please see the SICRIS database.

News archive

October 2024: Excited to announce that SyntaxFest 2025 will take place in Ljubljana in August 2025-bringing together five workshops—TLT, UDW, DepLing, IWPT, and Quasy—and two UniDive pre-conference events.

July 2024: Release of STARK v3 – a significantly enhanced version of this versatile tool for bottom-up linguistic analysis and comparison of UD treebanks.

October 2023: Honoured to give an invited talk on 'Cross-lingually Harmonized Approaches to Spoken Data Annotation' at SPELLL 2023.

July 2023: Join us at ESSLLI 2023, the European Summer School in Logic, Language, and Information, hosted by the University of Ljubljana, where I'll be serving as the Local PC Chair.

October 2022: Very excited to learn that my postdoctoral project proposal 'A Treebank-Driven Approach to the Study of Spoken Slovenian' has been selected for funding.

September 2022: Kick-off meeting of the UniDive COST Action on universality, diversity, and idiosyncrasy in language technology. I am honoured to have been elected as a co-leader of the WG1 on Corpus Annotation.

May 2022: Looking forward to the LREC 2022 in Marseille where I will be presenting a paper on spoken language treebanks (main conference) and a paper on the SSJ treebank extension (LAW workshop).

March 2022: I was invited as a speaker at the ESFRI 20th anniversary conference to present the CLARIN infrastructure and its impact on my research work. The presentation was also featured as a CLARIN Impact Story.

October 2021: Kick-off meeting for project SLED: Monitor Corpus for Slovene and Related Language Resources.

July 2021: Launch of the DSDE Universal Dependencies annotation campaign aiming at 5,000 new manually parsed sentences for Slovenian.

April 2021: I co-organized the EACL 2021 Language Diversity Games as part of the Language Diversity Panel and Games event at EACL 2021.

March 2021: I joined the Development of Slovene in a Digital Environment project to work on SSJ UD treebank extension, CLASSLA-Stanza pipeline evaluation and GOS spoken corpus concordancer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About me

Research areas

Current projects

Selected past projects

Publications

Recent news

News archive

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

About me

Research areas

Current projects

Selected past projects

Publications

Recent news

News archive