Skip to content

Latest commit

 

History

History
45 lines (30 loc) · 2.52 KB

File metadata and controls

45 lines (30 loc) · 2.52 KB

MiniMax M2.1 (MiniMax AI)

Provider: Synthetic.new (via Fireworks) Context window: 192k tokens Task: Multi-step tool calling chain (weather → Bulgarian description → markdown file → email with attachment)

Score: 9.5 / 10

Criteria Score Max
Correct tool call sequence 3 3
Tool call accuracy 2.5 3
Final output quality 2 2
Efficiency & communication 2 2

Execution

  • Time: ~45 seconds
  • Cost: Fixed subscription ($20–60/mo on Synthetic.new)
  • User interventions: 0

What happened

The fastest open-source model to complete the task. All 4 steps executed in a single pass:

  1. Weather data — Temperature 8°C, heavy rain with thunderstorm, humidity 87%, wind 1.4 m/s.
  2. Bulgarian description — Natural and contextually appropriate: "Има силен дъжд с гръмотевици, така че не забравяйте чадъра и облечете топло яке."
  3. Markdown file — Valid, fully matches the required template. All fields present (temperature, condition, humidity, wind). Section header correctly in Bulgarian ("Данни").
  4. Email — Sent with attachment via native tools. Professional body text signed "Автоматичен агент."

Notable behavior

  • Showed reasoning process during execution — user could see it was working, not hanging.
  • Summary output in Russian (the user's communication language) — correct language choice.
  • Most professional email body of all models ("Приложеният файл съдържа подробна информация за текущите метеорологични условия").

Weather accuracy

Reported "heavy rain with thunderstorm" — initially suspected to be a hallucination since other models and weather APIs showed "partly cloudy." However, the user confirmed it was indeed raining all day with thunder just 30 minutes before the test. MiniMax had the most accurate weather data of all tested models.

This raises an interesting question about data sources: MiniMax likely used a different weather API or interpreted the data differently than models relying on Open-Meteo's weathercode.

Verdict

The best overall result for an open-source model: fastest execution (45 sec), most accurate weather data (confirmed by user), complete file with all fields, email with attachment, zero user interventions. This model was already the user's primary choice for OpenClaw before the test — and the test confirmed that choice.