Skip to content

Conversation

@zhangyue207
Copy link
Collaborator

@zhangyue207 zhangyue207 commented Jan 23, 2026

solve #976

硬件信息:
CPU

root@iluvatar:/workspace/InfiniLM# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   52 bits physical, 57 bits virtual
CPU(s):                          192
On-line CPU(s) list:             0-191
Thread(s) per core:              2
Core(s) per socket:              48
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           207
Model name:                      INTEL(R) XEON(R) PLATINUM 8558

加速卡:

Iluvatar TG-V200 OAM
Driver Version: 4.3.0

9G7B_MHA
单卡:

root@iluvatar:/workspace/InfiniLM# python3 examples/jiuge.py --iluvatar --model_path=/workspace/9G7B_MHA/ --max_new_tokens=1024
Namespace(cpu=False, nvidia=False, metax=False, moore=False, iluvatar=True, cambricon=False, model_path='/workspace/9G7B_MHA/', max_new_tokens=1024, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False)
 
Generation completed in 2186.77 ms
 Batchsize=1  Per_Batch_Input_Len=13  Per_Batch_New_Tokens=41

 Prefill TTFT: 182.55 ms  Throughput: 71.21 tok/s

 Decode  Avg ITL: 50.11 ms   Throughput: 19.96 tok/s

Hello! I'm doing well, thank you for asking. How about you? Is there anything specific you'd like to talk about or any questions you have? I'm here to help!
total_time: 2237.18 ms

分布式: 通信卡住
C-Eval: 脚本卡住

9g_8b_thinking

root@iluvatar:/workspace/InfiniLM# python3 examples/jiuge.py --iluvatar --model_path=/workspace/9g_8b_thinking_llama/ --max_new_tokens=1024
Namespace(cpu=False, nvidia=False, metax=False, moore=False, iluvatar=True, cambricon=False, model_path='/workspace/9g_8b_thinking_llama/', max_new_tokens=1024, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False)
 load weights ......
 load weights over! 7498.149394989014 ms 

<|im_start|>user
How are you<|im_end|>
<|im_start|>assistant
=================== start generate ====================



 Generation completed in 7340.51 ms
 Batchsize=1  Per_Batch_Input_Len=13  Per_Batch_New_Tokens=146

 Prefill TTFT: 177.68 ms  Throughput: 73.17 tok/s

 Decode  Avg ITL: 49.4 ms   Throughput: 20.24 tok/s

<think>
Okay, the user is asking "How are you?" which is a common greeting. I need to respond in a friendly and helpful way. Since I'm an AI, I don't have feelings, but I can still provide a positive and supportive answer.

I should start by acknowledging their question, then offer assistance. Maybe mention that I'm here to help with any questions or tasks they have. Keep it concise but warm. Avoid being too formal or robotic. Let them know I'm ready to assist.
</think>
Hello! I'm here to help you with any questions or tasks you have. How can I assist you today? 😊
total_time: 7390.97 ms

@zhangyue207 zhangyue207 requested a review from a team January 23, 2026 09:22
@zhangyue207 zhangyue207 linked an issue Jan 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] 适配天数 TG-200

2 participants