This is the changelog for the open source version of tiktoken.
- Support for
gpt-4o - Performance improvements
- Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc!
- Add
text-embedding-3-*models toencoding_for_model - Check content hash for downloaded files
- Allow pickling
Encodingobjects. RegisteredEncodingwill be pickled by reference - Workaround PyO3 bug for frozenset conversion
Thank you to @paplorinc, @mdwelsh, @Praneet460!
- Build wheels for Python 3.12
- Update version of PyO3 to allow multiple imports
- Avoid permission errors when using default cache logic
- Add
encoding_name_for_model, undo some renames to variables that are implementation details
- Add
tiktoken._educationalsubmodule to better document how byte pair encoding works - Ensure
encoding_for_modelknows about several new models - Add
decode_with_offets - Better error for failures with the plugin mechanism
- Make more tests public
- Update versions of dependencies
- Add
decode_batchanddecode_bytes_batch - Improve error messages and handling
tiktokenwill now make a best effort attempt to replace surrogate pairs with the corresponding Unicode character and will replace lone surrogates with the Unicode replacement character.
- Add encoding for GPT-4
- Build aarch64 wheels
- Make
blobfilean optional dependency
Thank you to @messense for the environment variable that makes cargo not OOM under emulation!
- Improve performance by 5-20%; thank you to @nistath!
- Add
gpt-3.5-turbomodels toencoding_for_model - Add prefix matching to
encoding_for_modelto better support future model versions - Fix a bug in the README instructions on extending tiktoken
- Update the set of available encodings
- Add packaging metadata
- Add
tiktoken.encoding_for_modelto get the encoding for a specific model - Improve portability of caching logic
Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections
- Avoid use of
blobfilefor public files - Add support for Python 3.8
- Add py.typed
- Improve the public tests
- Initial release