Skip to content

Pr/multilingual#121

Open
raghavm243512 wants to merge 53 commits into
mainfrom
pr/multilingual
Open

Pr/multilingual#121
raghavm243512 wants to merge 53 commits into
mainfrom
pr/multilingual

Conversation

@raghavm243512
Copy link
Copy Markdown
Collaborator

@raghavm243512 raghavm243512 commented May 18, 2026

initial multilingual version

Easily extendable to many language using the add_culture_data script. This will do translation, gender consistent naming, suggest names, extend data, etc. So if anyone wants to run a language not committed in EVA data, it is trivially easy to do so.
Readme section showing basic of adding a language.

This adds:
Multilingual data schema and content (initial utterances, system prompt, name aliases)
multilingual support in code
Prompt updates to support multi languages
Script to "add a language" with high degree of automation
WER metric normalization rules, dynamically set per language and creatable via LLM through adding script
Automatic .env.example adjustments (maintains config app accuracy)

Still TODO:

  • Currencies
  • Phone numbers (airline only problem)
  • Actually committing the translations (didn't want to burn credits until finalized)
  • Analysis
  • Testing a large variety of models to ensure they actually get the language code they expected (es-MX vs es, for example)

Copy link
Copy Markdown
Collaborator

@katstankiewicz katstankiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add ensure_ascii=False to AuditLog save()

Comment thread src/eva/utils/culture.py Outdated
Comment thread scripts/run_text_only.py Outdated
Comment thread scripts/run_text_only.py Outdated
Comment thread src/eva/assistant/pipeline/services.py Outdated
Comment thread scripts/add_culture_data.py Outdated
Comment thread scripts/add_culture_data.py
Comment thread scripts/add_culture_data.py Outdated
Comment thread scripts/add_culture_data.py Outdated
@raghavm243512 raghavm243512 marked this pull request as ready for review May 28, 2026 21:14
Comment thread configs/agents/initial_messages.yaml Outdated
Comment thread configs/prompts/judge.yaml Outdated
Comment thread configs/prompts/judge.yaml Outdated
Comment thread configs/prompts/judge.yaml Outdated
Comment thread configs/prompts/simulation.yaml Outdated
Comment thread data/airline_dataset.json Outdated
Comment thread data/airline_dataset.json Outdated
Comment thread data/airline_dataset.json Outdated
Comment thread data/itsm_dataset.json Outdated
Comment thread data/itsm_dataset.json Outdated
Comment thread data/itsm_dataset.json
Comment thread data/itsm_dataset.json Outdated
Comment thread scripts/compute_metrics_on_failed.py
Comment thread src/eva/assistant/pipecat_server.py
Comment thread src/eva/assistant/pipeline/alm_base.py
Comment thread src/eva/user_simulator/client.py Outdated
Comment thread tests/unit/utils/test_wer_utils.py
Comment thread src/eva/models/config.py Outdated
Comment thread src/eva/models/config.py
@raghavm243512
Copy link
Copy Markdown
Collaborator Author

raghavm243512 commented Jun 5, 2026

@gabegma new design to fix accent issue for user while keeping the real world potential of lacking accents on database side:

romanized placeholders are stored in DB rather than done at generation time. More deterministic, readable, cleaner, etc. and user is always seeing native scrip

Comment thread data/itsm_aliases/a_garage.json Outdated
@@ -0,0 +1,22 @@
{
"name": "A Garage",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this - I'm still thinking we should pass the translation to the user too, because otherwise they might not match, but we can tackle it in a future PR

Comment thread data/itsm_aliases/garage_a.json
Comment thread configs/prompts/simulation.yaml Outdated
Comment thread configs/prompts/simulation.yaml Outdated
Comment on lines +34 to +35
"first_name": "<FIRST_NAME_ROMANIZED>",
"last_name": "<LAST_NAME_ROMANIZED>",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have romanized here now?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real world data may not always keep native script. I can say my name is राघव and the agent should know I might mean "Raghav". So I made some of it romanized in DB and some not

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't see your comment about the new design - so that's on purpose? It doesn't seem like it was applied to all files.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be an issue when computing the hash for the expected DB state or for authentication success if we can have either?

Copy link
Copy Markdown
Collaborator Author

@raghavm243512 raghavm243512 Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I tested the task completion in French with a romanized expectation. Even with the accent (on the letter) task completion passed. The agent is told to try the romanized name spelling if needed. I didn't do it to all files because both could be valid depending on the database behind the scenes

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test that checks for a disconnect between the JSON individual files and the actual dataset? I think it would be hard to spot right now if one of them as romanized but not the other one?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placeholders will be static and unchanging but I can add a general thing that takes the initial DB, applies expected trace, and should pass per language? or something along these lines

Comment thread data/itsm_aliases/intellij_idea.json Outdated
Comment thread data/itsm_aliases/microsoft_visio.json Outdated
Comment thread data/itsm_aliases/adobe_creative_cloud.json Outdated
Comment thread src/eva/utils/culture.py
Comment thread src/eva/__init__.py
Copy link
Copy Markdown
Collaborator

@gabegma gabegma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!! I love that you have pushed your script so others can add new languages.

I'm still nervous about the aliases, and I think we should feed the translations to the user so they match. Or we change the tools themselves so we don't need to pass unstructured text.
I would do that before adding new languages, but in a separate PR.

I also think we are missing a few tests - especially for looking at the disconnect between the dataset's individual files and the main one, but I need to drop, so I'm pre-approving!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants