Bug Fix
All changes at #76.
API breaking changes
- Refactored the
flipmethods forFlagSideandFlagUpLoenums to return the flipped value directly instead of aResult, since the operation cannot fail. Updated all call sites to remove unnecessary?error propagation.
Enhancements
- Re-exported the BLAS traits prelude under the new feature in
rstsr/src/prelude.rsfor easier access. - Made all BLAS flag enums (
FlagOrder,TensorIterOrder,FlagTrans,FlagSide,FlagUpLo,FlagDiag,FlagSymm) deriveSerializeandDeserialize, and added Serderenameattributes for better control over their string representations.
Crate structure
- Added
serdeas a dependency in the relevantCargo.tomlfiles and imported the necessary traits inflags.rs. - Symlinked
CHANGELOG.mdto rstsr crate directory.
Documentation update:
- rstsr-tblis: Updated documentation for einsum functions.
API breaking changes:
- rstsr-core: Removed unnecessary trait bounds on
DeviceCreationAPIand related traits. Deprecatedarange_int_implas device function. RemovedDimStrideAPItrait. (#74) - rstsr-core: Changed
TensorDotAxestoAxesPairIndex<T>for general usage. Changed vecdot device trait function definition. (#74) - rstsr-core: Refactored AxesIndex trait bounds for improved error handling. (#71)
New features:
- rstsr-core: Added
matrix_transposefunction for array-api compliance. (#73) - rstsr-core: Added
vecdottraits and implementation (parallel for rayon devices). (#73) - rstsr-core: Added
reshape_with_args,into_compatible_shapefunctions for flexible reshape operations. (#70) - rstsr-tblis: Implemented
tensordotusing einsum. (#73)
Enhancements:
- rstsr-core: Efficiency improvement for
sum_axesand related reduction with contiguous memory optimization. (#74) - rstsr-core: Parallel
arangeandlinspacefor devices supporting rayon. (#74) - rstsr-core: Changed output layout rule for binary functions; contiguous axes parts are preserved. (#74)
- rstsr-core: Different-type reduction shares same implementation with same-type reduction. (#74)
- rstsr-core: Restructured linalg directory; updated API documentation style. (#73)
- rstsr-core: Enhanced stride checking in layout and reshape functions. Added NumPy-style tests for reshape and expand_dims. (#70, #71)
- rstsr-core: Testing framework updated for DeviceCpuSerial. Added reshape tests. (#69)
Fixes:
- rstsr-core: Fix
change_layout(contig may still require data copy). - rstsr-tblis: Fixed threading and prelude imports. (#74)
- rstsr-core: Various manipulation function bug fixes (expand_dims, flip). (#71)
Dev infrastructure:
- Introduced Claude Code configuration for AI-assisted development. (#72)
MSRV specified to 1.82.0, written to Cargo.toml.
Functionality changes:
- rstsr-core:
asarraywill not panic when layout is not compact if pass&[T]or&mut [T]to give view/mut-view tensor. However, passVec<T>to give owned tensor will still panic if layout is not compact.
Refactor:
- rstsr-core: Split
tensor/manuplication.rsinto a module with folder. (#67)
API breaking changes:
- rstsr-core: Supporting type-promotion for assign. (#65)
- rstsr-dtype-traits: Split previous dtype trait
PromotionAPIto two partsDTypePromoteAPIandDTypeCastAPI. (#66) - Refactor of some dtype-related traits and impl. (#66)
- rstsr-core: Add lifetime annotation to
TensorRefAPI/TensorRefMutAPI(these two traits are currently not applied to any implementations). (#67) - rstsr-common: Added traceback in error-handling, which changed the fundamental data structure of
Errorof RSTSR. (#67) - rstsr-common:
DimBaseAPI::const_ndimsis method (with&selfas argument) now. (#67)
Enhancements:
- rstsr-dtype-traits: add function
isclose. (#66) - rstsr-core: add function
allclose. (#66) - rstsr-core: simplifies some trait bounds for obtaining view of tensor. (#66)
- rstsr-core: Add macro
tensor_from_nested!(similar tondarray::array!). (#67) - rstsr-common: Added
rstsr_unwrapto print unwrap with traceback. Added cargo featuretracebackin many crates for printing traceback info of place of panic. (#67) - rstsr-common: Added
normalize_axes_indexfor development (similar tonumpy.normalize_axis_tuple). (#67) - rstsr-common: Added
Option<i/usize>toAxesIndextrait implementation. (#67) - Some API documents updated. (#67)
Fixes:
- Fixes some possible memory-safety problems of
uninitialized_vecin reduce implementations in rstsr-native-impl. (#66) - Fix
no_std. (#67) - Fixed
rt::expand_dimsandrt::flipin multiple axes cases (accordance to NumPy). (#67)
This is not a completed version. May have other API breaking changes before v0.6.0 release.
API breaking changes:
- Using
DeviceRawAPI<MaybeUninit<T>>instead ofDeviceRawAPI<T>for output types in device operator traits, changing both parameter types and trait bounds (#60). - Using
ExtNum,ExtFloat,ExtRealfor trait extensions to cratenum. Removed previous traits (per-functionality)AbsAPI,ReImAPI, etc. This affects trait bounds (RESTGroup/rstsr/#63). - Using data type promotion rules in several common functions in CPU device, changing trait bounds (#64).
Enhancements:
- Added TBLIS plugin (#60). Now Einstein summation is available from
rt::tblis::einsum(with cargo featurerstsr/tblisenabled, and linkage of libtblis.so). - Added
uninit,assume_init(#61). - Added
take,all,any,count_nonzero,nextafter,reciprocal(#63). - Using data type promotion rules for several common functions in CPU device implementations (sin, greater, etc., making comparasion of different types, or sin to integer list be evaluatable) (#64).
Refactor:
- Changed most internal device implementation that works with
empty_impltouninit_impl(#61). - Changed directory structure for device implementations (currently BLAS devices are categorized to directory
crates-device) (#62). - Removed previous rstsr-book in this repository (#63).
Parts of API document also updated.
MSRV specified:
- 1.84.1: with crate faer built;
- 1.82.0: other cases (due to crate
halfand rust language usage ofunsafe extern "C").
Code refactor:
- Revert MSRV from 1.87 to 1.84/1.82 (#59).
API breaking changes:
- Revert to non-default for dynamic loading (#57)
Fixes (with API breaking behavior change):
- Change Behavior of
DeviceCpuRayon::generate_pool(#56, RESTGroup/rstsr-ffi#9) - Fix KML threading lock in LAPACK functions (#58)
Enhancements:
Possible API breaking change:
- For conversion between CBLAS flags and RSTSR flags (defined in crate rstsr-common), previously CBLAS flags are in crate rstsr-lapack-ffi. Now those flags are defined in rstsr-cblas-base, and been applied in all FFI crates (at RESTGroup/rstsr-ffi).
Actions:
- Added ARM support (#52)
API breaking change: Supporting dynamic loading for OpenBLAS (#47)
- Update
rstsr-lapack-ffiandrstsr-openblas-ffiversion to v0.4. - Default to
dynamic_loadingfor using OpenBLAS. - Changes internal ways to call BLAS and LAPACK functions.
If compile time and disk usage becomes very large for rstsr-openblas-ffi, you may wish to set those options in Cargo.toml:
[profile.dev.package.rstsr-lapack-ffi]
opt-level = 0
debug = false
[profile.dev.package.rstsr-openblas-ffi]
opt-level = 0
debug = falseFix:
- Fix unpack_tri signature
- Fix gemm/syrk bug when k=0 (#46)
Enhancements:
- Feature with optional dependencies. Now in main crate
rstsr, using featurefaerandopenblasalong withlinalgandscishould be ok, without explicitly declaringrstsr-linalg-traitsandrstsr-sci-traitsas dependencies.
Bug Fix:
- Tested complex linalgs for Faer and OpenBLAS devices.
Enhancements:
- DeviceFaer: generalized eigen, triangular solve (#44)
Bug Fix:
- Fix panic when layout iterator size is zero (#42)
Enhancements:
- Add common function numa_refb and refb_numa implementation (#42)
- Implement clone for Tensor and TensorCow (#42)
- Added into_pack_array, into_unpack_array as associated function of TensorAny (#42)
- linalg: Solve-related functions supports vector (Ix1) RHS (#43)
Bug Fix:
- OpenBLAS device OpenMP
get_num_threadfunction (#40)- Note that threading control is not stablized. There may be an incoming API breaking change on this feature for v0.4+.
Enhancements:
- Feature addition (meshgrid, concat, stack, bool_select) (#38)
- Feature addition (cdist, lebedev_rule) (#39)
Refactor:
- Eliminate
Error: From<I::Error>trait bound (#38)
Bug Fix:
- Fix too strict stride check (#36)
API Breaking Change:
- Remove
into_slice_mut(#35)
Enhancements:
- Diagonal arguments now allows i32 as input
Bug Fix:
Bug Fix:
- Fix Faer linalg functions when tensor offset != 0 (#30).
Summary
- Added linalg functions for
DeviceFaer(#28).
API Breaking Change (user should not feel that):
- updates Faer version to v0.22, seems that v0.20/v0.21 changes handling logic for complex values
- Conversion from/to Faer made simple (but API breaking)
- Matmul made simple (but API breaking), now requires
faer::traits::ComplexFieldtype (trait impl based), instead of manually dispatch types (macro_rules based)
Enhancements:
- Functions added: cholesky, det, eigh (does not include generalized eigh), eigvalsh (same to eigh), inv, pinv, solve_general, svdvals
API Breaking Change:
- Remove
ge,gt,ne, ... in traitsTensorGreaterAPI,TensorNotEqualAPI, ... (#25)
Enhancements:
- linalg: functions added: slogdet, det, svd, eigvalsh, svdvals, pinv (#23)
- Summation to boolean tensor (#25)
- Basic advanced indexing function
index_select(#26) - Added TensorCow support for binary arithmetic operations (#22)
Something for fun:
- Changed logo to be ABBA-like style (#24)
API Breaking Change:
- Now
rt::linalg::eighreturnsEighResult, instead of simple 2-element tuple (eigenvalues, eigenvectors) (#18). - Now Lapack bindings will use Lapack (Fortran FFI) instead of LAPACKE (C FFI) by default (#18).
Enhancements:
- Now
TensorCowcan perfrom binary arithmetic operations, such like2.0 * a.reshape((2, 3, 4)). Note that in some cases, rust compiler/rust-analyzer may not be able to deduce type of this result #22. - Performed various refactor to linalg functions.
- Lapack (Fortran FFI) is supported and used by default. It is implemented like LAPACKE (but in rust), and many codes are generated by AI (#18).
Various code refactors:
- Move more macro_rule implementatios to duplicate.
Internal refactor:
- Move out crate
rstsr-openblas-ffito rstsr-ffi repository, changes FFI bindings (#19). - Now
row_majorandcol_majorfeatures are mutually exclusive in complie time.
Feature addition:
- Column major is now supported (#16).
Bug fix:
- fix
pack_tril(correctness fix for col-major case).
Bug fix:
- fix
pack_tril(correctness fix, trait bound fix).
API breaking changes:
- Rayon thread pool getter function
get_poolchanged, addedget_current_pool, removedget_serial_pool(#14).
Code style changes (#15)
Bug fix:
- fix rayon pool memory blow up (#13)
This release features on BLAS and linalg implementations. Currently, functions such as cholesky, eigh, solve_general in rstsr-linalg-traits have been implemented.
Also many enhancements in rstsr-core.
Initial release. Most features in Python Array API has been implemented.