<img width="1001" height="433" alt="Image" src="https://github.com/user-attachments/assets/52e38e1a-4b49-4728-85e8-1a4ef2b32c24" /> 识别的完全不准确,2分钟的音频,识别出以下内容 <img width="1422" height="483" alt="Image" src="https://github.com/user-attachments/assets/2af0018d-7977-477b-8baa-853d25df3e2f" /> 希望官方可以进一步提升性能