Releases: NVIDIA/multi-storage-client
Releases · NVIDIA/multi-storage-client
0.47.0
Multi-Storage Client (MSC)
Breaking Changes
s8kprovider no longer pinsrequest_checksum_calculation/response_checksum_validationtowhen_required; botocore defaults now apply, and botocore >= 1.36.0 adds a CRC32 checksum to every upload.
New Features
- Added
checksum_algorithmoption tos3ands8kproviders for AWS S3 additional upload checksums (CRC32,CRC32C,SHA1,SHA256,CRC64NVME). Whenrust_clientis enabled, onlySHA256is supported. - End-to-end symlink preservation for uploads and downloads. MSC now treats symbolic links as a first-class concept across all storage providers, so symlinks in a POSIX source tree can be round-tripped through object storage without losing their link semantics. This adds:
- A new
make_symlinkAPI on everyStorageProvider(S3, GCS, Azure, OCI, AIS, POSIX), which writes an empty object whose user metadata carries the link target on cloud backends, and a native symlink on POSIX. - A
symlink_handlingoption onlist/list_fileswith three modes:FOLLOW(default, backward-compatible),SKIP(exclude symlinks), andPRESERVE(yield symlinks as leaf entries withObjectMetadata.symlink_targetpopulated). sync_from/msc synchonorsymlink_handling=PRESERVEto recreate symlinks at the destination instead of dereferencing them, so uploading a directory to object storage and downloading it back reconstructs the original link structure.
- A new
- Add opt-in client-side MD5 verification to the Azure storage provider.
- Support multipart upload and download in the Azure storage provider.
Bug Fixes
- Sort directory entries in the POSIX storage provider to match S3 lexicographical order.
- Improve
msc syncperformance by skipping per-object metadata lookups for files up to 16MB, avoiding redundant HEAD requests during synchronization. - Upgrade
fastmcp,cryptography,pygments, andaxiosto address security advisories.
Multi-Storage File System (MSFS)
New Features
- Add
fiobenchmark script for MSFS performance testing.
Bug Fixes
- Add
globalsStructlock tracking. - Additional fixes for the
fiobenchmark.
0.46.0
Multi-Storage Client (MSC)
New Features
- Add batch
download_filesandupload_filesAPIs toStorageClientfor efficient multi-file transfers. - Support presigned URL generation for Azure Blob Storage via SAS tokens.
- Use
download_filesandupload_filesinsync_fromfor efficient transfers. - Support non-default AWS profile in S3 storage providers through MSC config.
- Support workload identity in Azure storage provider through rclone config.
- Support origin path in CloudFront signed URLs.
- Support security token file in OCI CLI/SDK configs.
Bug Fixes
- Support credential-less rclone configurations.
- Relax
google-cloud-storagedependency to>=2.12,<4. - Setup non-default HTTP client for AWS SDK for Rust config.
- Add
cryptographyas vault extra dependency. - Support custom attributes in
upload_files.
Multi-Storage File System (MSFS)
New Features
- Integrate PebbleDB as metadata cache on-disk store.
- Add "pseudo" backend to enable arbitrary directory tree extreme testing.
- Support
listObjectswithStartAfter. - Avoid HEAD calls to emulate IfMatch.
0.45.1
Multi-Storage Client (MSC)
Bug Fixes
- Restore
cryptographyas an optional dependency; it was incorrectly added as a core requirement in 0.45.0. - Lower the minimum cache refresh interval from 300 seconds to 1 second.
- Fix incorrect handling of short final chunks and improve partial file cache reliability.
- Upgrade grpc-go, pyOpenSSL, and Authlib to address CVE-2026-33186, CVE-2026-27459, and CVE-2026-27962.
Multi-Storage File System (MSFS)
New Features
- Adopt sortedmap for inode management.
0.45.0
Multi-Storage Client (MSC)
New Features
- Add presigned URL generation via
msc.generate_presigned_url()shortcut andStorageClient.generate_presigned_url(). Supports S3 native signing and CloudFront signed URLs through a pluggableURLSignerabstraction. - Add
dryrunmode tosync_fromandmsc.sync()that enumerates and compares objects without performing any copy/delete, streaming results to JSONL files viaDryrunResultonSyncResult.
Bug Fixes
- Fix sync worker to protect
add_filereferences from concurrent modification. - Fix handling of directory paths in
ManifestMetadataProvider.get_object_metadata. - Remove placement group overhead from Ray sync workers.
Multi-Storage File System (MSFS)
New Features
- Integrate sortedmap B+Tree data structure for enhanced scalability.
- Shift to pointer-free
inodeStructfor improved memory efficiency.
0.44.0
Multi-Storage Client (MSC)
New Features
- Add
DefaultAzureCredentialsProviderfor Azure Blob authentication via Azure Identity (e.g. managed identity, Azure CLI, environment variables).
Bug Fixes
- Fix
ImportErrorwhen importingmultistorageclient.instrumentation.authwithout the[vault]extra installed. - Preserve metadata-only updates when sync skips copy.
Multi-Storage File System (MSFS)
New Features
- Add GCS backend support.
0.43.0
Multi-Storage Client (MSC)
Breaking Changes
- Drop Python 3.9 support.
New Features
- Add
list_recursivemethod toStorageClientwith parallel listing support for S3, Azure, GCS, and OCI providers. - Pass-through
s3parameter tobotocore.config.Configin S3 storage provider. - Support bulk delete in storage providers and in
msc sync. - Route sync metadata updates through queue-backed provider.
- Add
follow_symlinkstolist()shortcut.
Bug Fixes
- Resolve
IsADirectoryErrorwhen syncing a single file to a directory. - Compare sync
last_modifiedat seconds resolution to avoid spurious re-syncs. - Ensure directory exists for lock file creation in sync worker.
- Prevent Rust multipart uploads from exceeding the 10k part limit.
- Add logic for invalidating cached mTLS certificates.
Multi-Storage File System (MSFS)
New Features
- Add
listObjects()backend API. - Add support for
unlink(file delete).
0.42.0
Multi-Storage Client (MSC)
New Features
The MSC Explorer Web UI is now integrated into MSC, providing a browser-based interface to browse and manage objects across your configured storage backends. See the documentation for more details.
- Bumped OpenTelemetry package version to 1.32.1 for mTLS support.
Bug Fixes
- Avoid HEAD requests when checking source version to reduce unnecessary API calls.
- Resolve sync worker performance regression with bounded backpressure.
- Raise
ValueErrorwhenbase_pathis empty inlist_objects. - Resolve multiprocessing metadata loss and related test failures in sync operations.
- Add early validation for
cache_line_sizeexceeding cache size.
Multi-Storage File System (MSFS)
New Features
- Added support for
mkdirandrmdiroperations. - Added prefetch of directories when
readdir/readdirplushas not already done so, improving directory listing performance. - MSFS Grafana dashboard now reports READ metrics.
Bug Fixes
- Fixed double locking in directory prefetcher.
0.41.0
Multi-Storage Client (MSC)
Breaking Changes
The default POSIX profile has been renamed from default to __filesystem__. If your code or configuration explicitly sets profile="default", please update it to profile="__filesystem__" (or remove the profile parameter entirely). Users who do not explicitly specify a profile, or who are not using the reserved POSIX profile, are not affected.
New Features
- Added multi-backend configuration support, enabling datasets distributed across multiple storage backends to be accessed through a single unified profile. See configuration reference for usage details.
- Added size-based batching to
msc syncoperations for improved performance and resource utilization. - Added automatic ulimit increase for high-concurrency operations to prevent file descriptor exhaustion.
- Added GCS credentials provider support for Rust client with comprehensive refactoring.
- Implemented async metrics collection to significantly reduce OpenTelemetry overhead in telemetry operations.
Bug Fixes
- Fixed
sync_fromignoringpreserve_source_attributesparameter when usingsource_filesargument. - Fixed directory creation race conditions on distributed filesystems by implementing proper handling of concurrent mkdir operations.
Multi-Storage File System (MSFS)
New Features
- Added Prometheus/Grafana integration with dashboard support for monitoring file system operations.
Bug Fixes
- Fixed cache
ttl_check_intervalvalidation to ensure it is always positive. - Fixed unlocked cache prune call that could cause race conditions.
- Fixed inode eviction to properly include clean cache lines.
- Fixed missing
inode.touch()calls in file system operations. - Fixed Grafana dashboard job selector for metrics.
0.40.0
New Features
- Add
--includeand--excludepattern filtering tomsc lscommand for selective file listing. - Enhance
SyncResultto separately track files and bytes added/deleted, providing complete visibility into sync operations. - Add configurable retry settings for Rust client to improve reliability and customization.
Bug Fixes
- Fix automatic retry for HTTP 408 (Request Timeout) errors when accessing Google Cloud Storage via S3-compatible API.
- Add default timeout values for S3 storage providers to prevent indefinite hangs.
- Fix unnecessary directory creation when opening files in read-only mode.
0.39.1
New Features
- Increase default sync concurrency to improve performance, especially on machines with fewer CPU cores.
Bug Fixes
- Update
_is_rust_client_enabled()forCompositeStorageClientto returnTruewhen all child clients are Rust-enabled.