All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
2.11.0 - 2025-08-04
tensorizer.torch_compatis a new module for usingtensorizeras a backend for handling tensor data during standardtorch.saveandtorch.loadcalls- To use
tensorizeras a backend fortorch.save, wrap the call in thetensorizer_savingcontext manager- The file created must then be loaded using
tensorizer_loading
- The file created must then be loaded using
- To use
tensorizeras a backend fortorch.load, wrap the call in thetensorizer_loadingcontext manager- The file to load must have been created using
tensorizer_saving
- The file to load must have been created using
- To use
2.10.1 - 2025-06-27
TensorDeserializerobjects now respect CUDA devices chosen with thetorch.device()andtorch.cuda.device()context managers- Previously, the
deviceparameter to theTensorDeserializerconstructor was the only way to choose between multiple CUDA devices (e.g.cuda:0,cuda:1) - Now, when a device with no index is specified, such as
device=torch.device("cuda"),device="cuda", or evendevice=None, the local device context is used to select the device torch.cuda.set_device()andtorch.set_default_device()are also supported- This doesn't add support for detecting CPU contexts such as
torch.device("cpu")without explicitly specifyingdevice="cpu"- This is for backwards compatibility, as the
TensorDeserializerdefault device has always been mandated to be CUDA whenever CUDA is available, andtorchdoes not provide a public interface to reliably disambiguate an intentionaltorch.device("cpu")context from the global default
- This is for backwards compatibility, as the
- Lazy-loaded tensors will use the active device as of the time they are actually loaded
- Previously, the
TensorDeserializerobjects no longer fail to open file-like objects whosefileno()methods raiseio.UnsupportedOperation
2.10.0 - 2025-06-09
stream_io.open_stream()now respects Boto3's configuration files and environment variables when searching for object storage credentials to use
stream_io.open_stream()now uses virtual-hosted-style bucket addressing for thecwobject.comandcwlota.comendpointsstream_io.open_stream()now allows theuse_httpsentry of.s3cfgconfiguration files to fill in itsforce_httpparameter ifforce_httpis not explicitly specified asTrueorFalseTensorSerializerno longer throws an error when attempting to serialize very large tensors on some non-Linux platforms- Object storage uploads managed by
stream_io.open_stream()now finalize correctly on Python 3.12+ even without an explicit call to theirclose()method- A fix for this was originally implemented in release 2.7.2, but it only worked for Python versions below 3.12
2.9.3 - 2025-05-09
stream_io.open_stream()now defaults to authenticating with signature version 4 rather than signature version 2 when nos3_signature_versionis specified for reads from most object storage endpoints
2.9.2 - 2025-02-20
- Fixed compatibility with
numpy>=2.0.0- Calls to the removed
numpy.productfunction now usenumpy.prodinstead
- Calls to the removed
2.9.1 - 2024-11-27
TensorSerializerno longer sometimes fails to serialize very large 1-dimensional tensors with multibytedtypesRedisStreamFile.readable()andRedisStreamFile.seekable()now correctly returnTrue
2.9.0 - 2024-04-17
- Multiple file readers during deserialization (#87)
- Controlled by the new
num_readersintparameter to theTensorDeserializerconstructor - Files capable of having multiple readers opened to the same source can
make use of this parameter to increase deserialization speed
- Files on the filesystem and HTTP(S) & S3 streams from
stream_io.open_streamare eligible to be reopened this way
- Files on the filesystem and HTTP(S) & S3 streams from
- The default number of readers is dynamic based on the type of file used
- To disable concurrent readers, pass
num_readers=1as a parameter
- To disable concurrent readers, pass
- Controlled by the new
- Structured object serialization (#115)
TensorSerializer.write_state_dictcan now write nested mappings, sequences, and other mixtures of mappings and sequences nested in each other- When accessing an object serialized this way with a
TensorDeserializer, sequences are converted to mappings with integer keys TensorDeserializer.treeallows converting the deserialized objects back to a compatible collection type- Serialized as a sequence →
collections.abc.Sequence - Serialized as a mapping →
collections.abc.Mapping
- Serialized as a sequence →
- For more information, see:
- The
TensorSerializer.write_state_dictdocstring - The
TensorDeserializer.treedocstring - PR #115
- The
- Configurable CPU concurrency limit during serialization
- Controlled by the new
limit_cpu_concurrencyintparameter to theTensorSerializerconstructor
- Controlled by the new
- New optional keyword parameters to
stream_io.open_stream:- Object storage connection settings
s3_region_name&s3_signature_version - File byte range markers
start&endstartapplies to all files and streamsendapplies only to HTTP(S) & S3 streams, for which it is interpreted as thestartandendparameters for the createdCURLStreamFileobject
- Object storage connection settings
- The
plaid_modeandplaid_mode_buffersparameters toTensorDeserializerno longer have an effect- The previous default behaviour (
plaid_mode=Truewherever available) is now always applied
- The previous default behaviour (
- Serialization performance has been improved
TensorDeserializer.read_tensorsnow returns tensors on the target device, and functions more efficiently- Previously, the returned values were always on the CPU
TensorDeserializer.read_tensors's behaviour is no longer affected by the position of the file descriptor at the time of the call- Sequential calls to
read_tensorsstill read consecutive parts of the file
- Sequential calls to
- Importing
tensorizerdoesn't implicitly initializetorch.cudawhenever a GPU is available- This allows forking after importing
tensorizer, and using the library in a subprocess
- This allows forking after importing
TensorDeserializer.read_numpy_arraysnow throws an error when used with CUDA deserialization, since numpy arrays can't be deserialized to CUDA
- Fixed a bug where
stream_io.CURLStreamFileobjects constructed with anendparameter would read one byte past their end when callingCURLStreamFile.readwith no argument
2.8.1 - 2024-02-15
- Performance has been improved when serializing to some filesystems
(e.g. NFS, CephFS) by skipping
fallocatepre-allocation where it is not natively supported- Previously,
posix_fallocate's fallback behaviour was used, which wasted time writing out zeroes that would only be overwritten later
- Previously,
examples/hf_serialization.pyis now more robust when overwriting an existing serialized model in an object storage bucket- Previously, it would sometimes find and use outdated, cached data, and thus erroneously skip serialization and/or fail validation
2.8.0 - 2024-02-08
- Tensors on the
metadevice may now be serialized- These store no tensor data (only metadata) in the tensorized file
- These have no hashes for their tensor data, since there is nothing to hash
- These cannot have their data encrypted, since there is nothing to encrypt
- During deserialization, these are returned as zero-filled buffers on
the same device as other tensors
- Essentially equivalent to
torch.zeros_like(meta_tensor, device=...)
- Essentially equivalent to
TensorDeserializernow defaults toplaid_mode=Truewhen deserializing to CUDA devices for better performance- There is no difference between
plaid_mode-deserialized tensors and regular deserialized tensors (beyond deserialization performance), so this is not a breaking change
- There is no difference between
- Removed incorrect warnings in the documentation about
plaid_modebeing unsafe
- Passing
include_non_persistent_buffers=FalsetoTensorSerializer.write_module()now works as intended- Previously, setting this flag to
Falsefiltered out both non-persistent buffers and parameters, leaving only persistent buffers - The corrected behaviour only filters out non-persistent buffers, leaving parameters untouched
- Previously, setting this flag to
- Very large individual tensors (over approximately 2147479552 bytes)
now serialize correctly
- Previously, anything over the limit for a single
writeorpwritesyscall could not be fully written, and an error was raised during serialization - Now, multiple writes are used
- This also fixes large writes to unbuffered file-like objects if
pwriteis not supported, as they would encounter the same issue
- Previously, anything over the limit for a single
2.7.2 - 2024-01-30
- File objects opened with
stream_io.open_stream("s3://...", "wb")for writing to object storage now correctly upload their content when closed implicitly at the end of awithblock, without requiring an explicit call to their.close()method- Since
TensorSerializerobjects already call.close()explicitly on their output file objects, either whenTensorSerializer.close()is invoked or when theTensorSerializeris garbage collected, this bug mainly applies to manual usage ofstream_io.open_stream()for object storage uploads not involving aTensorSerializer
- Since
2.7.1 - 2023-12-06
- Fixed a bug where a
CURLStreamFilewould report itself as unreadable, causing HTTP(S) and S3 deserialization to fail
2.7.0 - 2023-12-06
- Tensor encryption
- Refer to docs/encryption.md for details
- Encrypts all tensor weights in a file with minimal overhead
- Doesn't encrypt tensor metadata, such as:
- Tensor name
- Tensor
dtype - Tensor shape & size
- Requires an up-to-date version of
libsodium- Use
apt-get install libsodium23on Ubuntu or Debian - On other platforms, follow the installation instructions from the libsodium documentation
- Takes up less than 500 KiB once installed
- Use
- Uses a parallelized version of XSalsa20-Poly1305 as its encryption algorithm
- Splits each tensor's weights into ≤ 2 MiB chunks, encrypted separately
- Example usage: see examples/encryption.py
- Example CLI tool to add or remove encryption from pre-serialized models: examples/encrypt_existing.py
- Added more error checking against deserializing corrupted files
- Added stricter error checking for file writes during serialization
- Fix cases where the
pynvmllibrary was available on a node with no NVML devices- This allows CPU-only deployments to work with
pynvmlin the image
- This allows CPU-only deployments to work with
- Fix serialization for tensors with discontiguous memory
- Fixed a bug where the
module_idxon bulk serialized tensors was misaligned- During bulk writes (
write_module(),write_state_dict()), each tensor was receiving the preceding one'smodule_idxinstead of its own
- During bulk writes (
2.6.0 - 2023-10-30
TensorSerializer.write_modulenow acceptsinclude_non_persistent_buffersas a keyword-only boolean argument that can be set toFalseto exclude buffers from serialization that were originally registered to the module through callingtorch.nn.Module.register_bufferwithpersistent=Falsetorch.nn.Module.state_dictnever includes persistent buffers, so setting this toFalsewill more closely match the behaviour ofstate_dictserializationTensorSerializer.write_moduleused to always include non-persistent buffers- The default (
include_non_persistent_buffers=True) matches the old behaviour
stream_io.open_streamandstream_io.CURLStreamFilenow accept an additional, optionalcertificate_handlingargument to customize the verification of SSL certificates- This corresponds to the flags
--cacert,--capath, and-k/--insecureincurl - Customization is achieved by passing an instance of
stream_io.CAInfotoopen_streamor theCURLStreamFileconstructor - Example usages:
open_stream("https://localhost/model.tensors", certificate_handling=CAInfo(cacert="./localhost.pem")open_stream("https://127.0.0.1/model.tensors", certificate_handling=CAInfo(allow_untrusted=True)
- Pass
certificate_handling=None(the default) to use default certificate verification as compiled into cURL
- This corresponds to the flags
2.5.1 - 2023-10-17
TensorSerializer.write_state_dicthas been optimized to better match the speed ofTensorSerializer.write_module- Improved error tracebacks reported during bulk tensor deserialization
- Serializing to a buffered file-like object with a large buffer size no longer sometimes corrupts the resulting serialized file
2.5.0 - 2023-10-13
TensorDeserializernow takes aplaid_mode_buffersargument specifying a fixed number of buffers to allocate whenplaid_mode=True- Previously,
plaid_modeused a single buffer - More buffers help when loading from very fast sources
or when
verify_hash=True - The new default number of buffers is contextual
- 1 for HTTP/S3 streams
- 2 for other streams (e.g. local files, Redis)
- 8 when
verify_hash=True
- Previously,
TensorDeserializerobjects can now be used as context managers to safely callTensorDeserializer.closewhen they are done being used
TensorDeserializermethods that load multiple tensors at a time are now fasterTensorDeserializer'sverify_hashmode is much, much faster- Specifying
plaid_mode=Truefor aTensorDeserializerno longer implies (or requires)lazy_load=True- The old default behaviour can be restored by specifying both
plaid_mode=True, lazy_load=True
- The old default behaviour can be restored by specifying both
plaid_modeno longer prohibits accessing previously loaded tensorsdtypeconversion is more efficient for CUDA tensor deserialization- Conversions are now performed on-device rather than on the CPU
- CPU memory is now freed immediately after
TensorDeserializerinitialization for CUDA tensor deserialization whenlazy_load=False
TensorDeserializer'slazy_loadmode no longer eagerly allocates memory that is never used
2.4.0 - 2023-10-05
- Support for
redis://URIs instream_io.open_stream- E.g.
redis://localhost:6379/mymodel
- E.g.
- New
stream_io.RedisStreamFileclass- Similar to
stream_io.CURLStreamFile
- Similar to
TensorDeserializer.to_redismethod for initially loading tensors into a Redis data storeforce_httpparameter tostream_io.open_streamto downgrade an S3 connection from HTTPS to HTTP- Warning! This will stream all data completely unencrypted
- Warning! If accessing a private S3 bucket, this will also send your object-scoped access key to the server unencrypted
buffer_sizeparameter tostream_io.open_streamto control the amount of data buffered in advance during HTTP(S) loading- Defaults to 16 MiB for HTTP(S) streams and 1 to 8 MiB for Redis streams
- Previously, this was fixed at 256 MiB
TensorSerializer.write_modulehas been optimized further for a speedup of ~3.6x on CUDA modules and ~3.1x on CPU modulesredisandhiredisare now required package dependencies
CURLStreamFile.response_headersno longer has a chance to contain incomplete header information
2.3.0 - 2023-09-06
CURLStreamFilenow tracks request headers inCURLStreamFile.response_headers- This can be used to track cache hits and misses during deserialization
through the
TensorDeserializer.cache_statusproperty
- This can be used to track cache hits and misses during deserialization
through the
2.2.0 - 2023-09-05
- Model serialization has been optimized for a speedup of approximately ~2x
2.1.2 - 2023-08-17
- Requests now include a custom
User-Agentheader specific totensorizer
2.1.1 - 2023-08-10
verify_hashparameter forTensorDeserializer.read_tensors- Matches the one for
TensorDeserializer.read_numpy_arrays
- Matches the one for
2.1.0 - 2023-08-09
- Hash verification of deserialized models
- During deserialization, specify
verify_hash=Truein either:- The
TensorDeserializerconstructor, TensorDeserializer.read_numpy_arrays, orTensorDeserializer.load_into_module(only while lazy loading)
- The
- Comparing a model already in memory against its
.tensorsfile:TensorDeserializer.verify_module
- During deserialization, specify
2.0.0 - 2023-06-07
bfloat16andcomplex32support
- Newly serialized files now use the
TENSORIZER_VERSION = 2binary format- Format v2 allows for
bfloat16andcomplex32dtypes to be stored - Existing format v1 files can still be deserialized (backwards-compatible)
- Format v2 allows for
TensorDeserializer'sdtypeparameter now only accepts the typestorch.dtypeandNone- It previously accepted
numpy.dtype,str, andNone
- It previously accepted
TensorDeserializer.read_tensorsnow yieldstorch.Tensorobjects instead ofnumpy.ndarrayobjectsTensorDeserializer.read_numpy_arraysprovides the old functionality- Will error when deserializing
bfloat16orcomplex32by default, since they are not valid dtypes innumpy - The parameter
allow_raw_datacan be specified to readbfloat16andcomplex32arrays anyway but with an invalid dtype
- Will error when deserializing
TensorDeserializer'splaid_modenow correctly implieslazy_load
1.1.0 - 2023-05-05
- Better docstrings for the public
tensorizerinterface - More memory utilities in
utils:MemoryUsage: Same information asget_mem_usageas a structured typeGlobalGPUMemoryUsage: GPU information subset ofMemoryUsageTorchGPUMemoryUsage: Torch information subset ofMemoryUsageCPUMemoryUsage: CPU information subset ofMemoryUsage
utils.no_init_or_tensorcan now be used as a context manager
1.0.1 - 2023-03-21
- Loading from public-read S3 buckets no longer requires blank credentials
to be explicitly specified via
stream_io.open_stream
1.0.0 - 2023-03-21
TensorSerializerclassTensorDeserializerclass- State dict compatibility
- File, HTTP(S), and S3 stream compatibility
stream_iomodule andstream_io.open_streaminterfaces3://tensorizedpublic bucket hosting pre-serialized modelsutilsmodule including:convert_bytesget_deviceget_mem_usageget_gpu_nameno_init_or_tensor