PATH = '/toolchain/LLM/telechat-12b-hf'
tokenizer = AutoTokenizer.from_pretrained(PATH, trust_remote_code=True)
print(tokenizer.encode(tokenizer.decode([2000]))) #[561,579]
print(tokenizer.decode([579])) # 'red'
print(tokenizer.encode('red')) # [2952]
print(tokenizer.decode([2952])) # 'red'
encode和decode绝大多数情况下应该是一个互逆操作,但是12b模型的tokenizer,encode和decode表现如下
可以解答一下吗? @hannawong @ZiYu0427 @liuxz0801 @Unknown-Body @LSX-Sneakerprogrammer