Hey all, I saw that the converter for BPE tokenizers from HuggingFace format to TikToken suggested in issue #358 is no longer active (and completely gone), so I write one myself: https://github.com/shakedzy/tiktokenizer.
It's also available on PyPI:
and it's a straightforward as possible:
from tiktokenizer import TikTokenizer
# Create a tiktoken encoding from any compatible HuggingFace model
encoding = TikTokenizer.create("Qwen/Qwen3-8B")
# Or load a previously created one
encoding = TikTokenizer.load("Qwen/Qwen3-8B")
# encoding is now a standard tiktoken endoder
tokens = encoding.encode("Hello, world!")
text = encoding.decode(tokens)
print(tokens) # [9707, 11, 1879, 0]
print(text) # Hello, world!
It's also MIT licensed, so no strings attached.
Hey all, I saw that the converter for BPE tokenizers from HuggingFace format to TikToken suggested in issue #358 is no longer active (and completely gone), so I write one myself: https://github.com/shakedzy/tiktokenizer.
It's also available on PyPI:
and it's a straightforward as possible:
It's also MIT licensed, so no strings attached.