Skip to content

[BadCase about the model]: CN-NewsTTS Bench includes MiniMax TTS results - metadata confirmation invited #93

Description

@Jayden-X-L

Basic Information - Models Used

minimax_tts-speech-2.8-hd

Basic Information - Scenario Description

TTS systems still struggle with Chinese news-style text

Is this badcase known and solvable?

Information about environment

git clone https://github.com/Jayden-X-L/cn-news-tts-bench.git
cd cn-news-tts-bench

python3 scripts/validate_dataset.py data/dev.jsonl
python3 scripts/validate_dataset.py data/test_public.jsonl

python3 scripts/score_submission.py
--dataset data/test_public.jsonl
--asr-results results/asr_results/public_test/volcengine_tts.asr.jsonl
--model-id volcengine_tts
--output-dir /tmp/cn-news-tts-repro

python3 scripts/aggregate_leaderboard.py
--per-model-dir results/per_model_public_test
--results-dir /tmp/cn-news-tts-leaderboard/results
--site-dir /tmp/cn-news-tts-leaderboard/site

shasum -a 256 -c release/v0.1_core_checksums.sha256

Call & Execution Information

  • model_id: minimax_tts
  • model name: speech-2.8-hd
  • voice: configured Mandarin news voice

Description

Hi MiniMax team,

I am Shijun Luo from NetEase Cloud Music, where I work on AI news briefing / AI news podcast generation. In our workflow, we use and evaluate major TTS systems to generate spoken news and information podcast content.

During this work, we found that many current TTS systems still struggle with Chinese news-style text, especially compact expressions that frequently appear in real news. These errors are not just voice-quality issues; they can change the information heard by listeners.

For example:

  • 苏-27 may be read as "苏负二十七" instead of the intended aircraft model name.
  • 96-91 may be read as a numeric range instead of a sports score.
  • 620N·m may be read letter by letter or as symbol fragments instead of a torque unit.
  • 3.5% may be read as "三点五百分号" or confused with percentage points.
  • AI / CEO may be expanded into "人工智能" / "首席执行官" when the original abbreviation should be preserved.

This motivated us to release CN-NewsTTS Bench, a raw-input Chinese news TTS benchmark focused on real-world news reading cases such as dates, numbers, units, named entities, mixed-script text, and text normalization.

The current public leaderboard includes a MiniMax TTS entry:

  • model_id: minimax_tts
  • model name: speech-2.8-hd
  • voice: configured Mandarin news voice

Repository:
https://github.com/Jayden-X-L/cn-news-tts-bench

We would like to invite the MiniMax team to:

  1. Confirm or correct the public model metadata.
  2. Submit an official result if the current configuration is not representative.
  3. Provide a system/model card if available.

Submission guide:
https://github.com/Jayden-X-L/cn-news-tts-bench/blob/main/SUBMIT.md

For questions or corrections, feel free to contact me:
xiaobiluo@gmail.com

Thanks!

Best,
Shijun Luo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions