Pr/multilingual by raghavm243512 · Pull Request #121 · ServiceNow/eva

raghavm243512 · 2026-05-18T21:35:49Z

initial multilingual version

Easily extendable to many language using the add_culture_data script. This will do translation, gender consistent naming, suggest names, extend data, etc. So if anyone wants to run a language not committed in EVA data, it is trivially easy to do so.
Readme section showing basic of adding a language.

This adds:
Multilingual data schema and content (initial utterances, system prompt, name aliases)
multilingual support in code
Prompt updates to support multi languages
Script to "add a language" with high degree of automation
WER metric normalization rules, dynamically set per language and creatable via LLM through adding script
Automatic .env.example adjustments (maintains config app accuracy)

Still TODO:

Currencies
Phone numbers (airline only problem)
Actually committing the translations (didn't want to burn credits until finalized)
Analysis
Testing a large variety of models to ensure they actually get the language code they expected (es-MX vs es, for example)

katstankiewicz

can you also add ensure_ascii=False to AuditLog save()

…behavioral_fidelity judge prompt

raghavm243512 · 2026-06-05T18:37:31Z

@gabegma new design to fix accent issue for user while keeping the real world potential of lacking accents on database side:

romanized placeholders are stored in DB rather than done at generation time. More deterministic, readable, cleaner, etc. and user is always seeing native scrip

gabegma · 2026-06-05T19:14:14Z

@@ -0,0 +1,22 @@
+{
+  "name": "A Garage",


Thanks for doing this - I'm still thinking we should pass the translation to the user too, because otherwise they might not match, but we can tackle it in a future PR

gabegma · 2026-06-05T19:26:07Z

+          "first_name": "<FIRST_NAME_ROMANIZED>",
+          "last_name": "<LAST_NAME_ROMANIZED>",


Why do we have romanized here now?

Real world data may not always keep native script. I can say my name is राघव and the agent should know I might mean "Raghav". So I made some of it romanized in DB and some not

Oh I didn't see your comment about the new design - so that's on purpose? It doesn't seem like it was applied to all files.

Won't this be an issue when computing the hash for the expected DB state or for authentication success if we can have either?

no I tested the task completion in French with a romanized expectation. Even with the accent (on the letter) task completion passed. The agent is told to try the romanized name spelling if needed. I didn't do it to all files because both could be valid depending on the database behind the scenes

Is it possible to add a test that checks for a disconnect between the JSON individual files and the actual dataset? I think it would be hard to spot right now if one of them as romanized but not the other one?

placeholders will be static and unchanging but I can add a general thing that takes the initial DB, applies expected trace, and should pass per language? or something along these lines

gabegma

Excellent work!! I love that you have pushed your script so others can add new languages.

I'm still nervous about the aliases, and I think we should feed the translations to the user so they match. Or we change the tools themselves so we don't need to pass unstructured text.
I would do that before adding new languages, but in a separate PR.

I also think we are missing a few tests - especially for looking at the disconnect between the dataset's individual files and the main one, but I need to drop, so I'm pre-approving!

raghavm243512 force-pushed the pr/multilingual branch from bd0e0d9 to 81923ab Compare May 19, 2026 18:39

raghavm243512 added 3 commits May 19, 2026 14:21

initial multilang impl

a4bcb4d

test fix

8ac2aaa

date formats

f5a5b52

raghavm243512 force-pushed the pr/multilingual branch from 68fb05a to f5a5b52 Compare May 19, 2026 21:28

translations and supporting stuff

ada9699

raghavm243512 force-pushed the pr/multilingual branch from 606dc7d to ada9699 Compare May 20, 2026 23:38

katstankiewicz reviewed May 21, 2026

View reviewed changes

raghavm243512 added 2 commits May 21, 2026 10:08

use display name for client

abc80e0

many finer points

b3ad5dc

raghavm243512 force-pushed the pr/multilingual branch from aa152bd to b3ad5dc Compare May 22, 2026 19:09

alias itsm

ff7bd59

raghavm243512 force-pushed the pr/multilingual branch from 26fa5ec to ff7bd59 Compare May 22, 2026 21:19

raghavm243512 and others added 10 commits May 22, 2026 21:20

Apply pre-commit

9b368f0

updated expected_db in dataset when adding culture data. update user_…

e3d90a4

…behavioral_fidelity judge prompt

update test

990eb76

add french number normalizer

b0d3a13

simplify adding languages

27be206

add language for elevenlabs

81c20a9

update stt_wer to handle french numbers

5a2377a

initial WER schema generation

bdbfe0b

docs and cleanup

132d19d

result improvements

30c6d43

raghavm243512 marked this pull request as ready for review May 28, 2026 21:14

update services to use 'settings' from pipecat update

dc910d9

fanny-riols reviewed May 29, 2026

View reviewed changes

Comment thread configs/agents/initial_messages.yaml Outdated

fanny-riols reviewed May 29, 2026

View reviewed changes

Comment thread configs/prompts/judge.yaml Outdated

fanny-riols reviewed May 29, 2026

View reviewed changes

Comment thread configs/prompts/judge.yaml Outdated

fanny-riols reviewed May 29, 2026

View reviewed changes

Comment thread configs/prompts/judge.yaml Outdated

fanny-riols reviewed May 29, 2026

View reviewed changes

Comment thread configs/prompts/simulation.yaml Outdated