Skip to content

Community Resource: HuggingFace to TikToken Converter #471

@shakedzy

Description

@shakedzy

Hey all, I saw that the converter for BPE tokenizers from HuggingFace format to TikToken suggested in issue #358 is no longer active (and completely gone), so I write one myself: https://github.com/shakedzy/tiktokenizer.

It's also available on PyPI:

pip install tiktokenizer

and it's a straightforward as possible:

from tiktokenizer import TikTokenizer

# Create a tiktoken encoding from any compatible HuggingFace model
encoding = TikTokenizer.create("Qwen/Qwen3-8B")

# Or load a previously created one
encoding = TikTokenizer.load("Qwen/Qwen3-8B")

# encoding is now a standard tiktoken endoder
tokens = encoding.encode("Hello, world!")
text = encoding.decode(tokens)

print(tokens)  # [9707, 11, 1879, 0]
print(text)    # Hello, world!

It's also MIT licensed, so no strings attached.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions