Generic Integers V2: It's Time#3686
Conversation
…p size/alignment to a multiple of 64 bits.
Fix some nits
…eric integers since that's not an issue any more
This reverts commit 25f85cc105cb04b4e87debf46f4547240c122ae4.
|
As much as I dislike 👍 from me |
|
Even if we should probably leave them out of the initial RFC for complexity reasons, I would just cheat with floats, as they rely on system libraries and hardware instructions way more than regular integers. By that, I mean that I'd allow |
Are you proposing delaying the discussion or the implementation? My understanding is that with a release early 2025, Rust 2024 will be done by mid November, which is only 2 months away, and it seems quite unlikely this RFC would be accepted and implementation ready to start by then, so I see no conflict with regard to starting on the implementation... ... but I could understand a focus on the edition for the next 2 months, and thus less bandwidth available for discussing RFCs. |
The problem with this approach is that any "cheating" becomes permanently stabilised, and thus, it's worth putting in some thought for the design. This isn't to say that Plus, monomorphisation-time errors were actually one of the big downsides to the original RFC, and I suspect that people haven't really changed their thoughts since then. Effectively, while it's okay to allow some of edge-case monomorphisation-time errors like this RFC includes (for example, asking for One potential solution that was proposed for unifying And it would support all float types, forever, and there would be no invalid values for
As stated: yes, RFCs take time to discuss and implement and it's very reasonable to expect people to focus on the 2024 edition for now. However, that doesn't mean that we can't discuss this now, especially since there are bound to be things that were missed that would be good to point out. |
|
|
||
| In general, operations on `u<N>` and `i<N>` should work the same as they do for existing integer types, although the compiler may need to special-case `N = 0` and `N = 1` if they're not supported by the backend. | ||
|
|
||
| When stored, `u<N>` should always zero-extend to the size of the type and `i<N>` should always sign-extend. This means that any padding bits for `u<N>` can be expected to be zero, but padding bits for `i<N>` may be either all-zero or all-one depending on the sign. |
There was a problem hiding this comment.
Please clarify this to say what exactly happens when I transmute e.g. 255u8 to u<7> (and similar to i<N>). I assume it is UB, i.e., the validity invariant of these types says that the remaining bits are zero-extended / sign-extended, but the RFC should make that explicit.
Note that calling this "padding" might be confusing since "padding" in structs is uninitialized, but here padding would be defined to always have very specific values. (That would, e.g. allow, it to be used as a niche for enum optimizations.)
There was a problem hiding this comment.
Yeah, I'm not quite sure what a better name is; it's the same as rustc_layout_scalar_valid_range, which is UB if the bits are invalid.
There was a problem hiding this comment.
I guess that since this is the reference description, calling them niche bits would be more appropriate? Would that feel reasonable?
There was a problem hiding this comment.
No. Niche bits are an implementation detail of the enum layout algorithm, and mostly not stable nor documented.
Just describe what the valid representations of values of these type are, i.e., what should go into this section about these types.
|
|
||
| The compiler should be allowed to restrict `N` even further, maybe even as low as `u16::MAX`, due to other restrictions that may apply. For example, the LLVM backend currently only allows integers with widths up to `u<23>::MAX` (not a typo; 23, not 32). On 16-bit targets, using `usize` further restricts these integers to `u16::MAX` bits. | ||
|
|
||
| While `N` could be a `u32` instead of `usize`, keeping it at `usize` makes things slightly more natural when converting bits to array lengths and other length-generics, and these quite high cutoff points are seen as acceptable. In particular, this helps using `N` for an array index until [`generic_const_exprs`] is stabilized. |
There was a problem hiding this comment.
You mean "using N for an array length", I assume?
|
|
||
| As an example, someone might end up using `u<7>` for a percent since it allows fewer extraneous values (`101..=127`) than `u<8>` (`101..=255`), although this actually just overcomplicates the code for little benefit, and may even make the performance worse. | ||
|
|
||
| Overall, things have changed dramatically since [the last time this RFC was submitted][#2581]. Back then, const generics weren't even implemented in the compiler yet, but now, they're used throughout the Rust ecosystem. Additionally, it's clear that LLVM definitely supports generic integers to a reasonable extent, and languages like [Zig] and even [C][`_BitInt`] have implemented them. A lot of people think it's time to start considering them for real. |
There was a problem hiding this comment.
I wouldn't say Zig has generic integers, it seems like they have arbitrarily-sized integers. Or is it possible to write code that is generic over the integer size?
There was a problem hiding this comment.
well actually you can
const std = @import("std");
fn U(comptime bits: u16) type {
return @Type(std.builtin.Type {
.Int = std.builtin.Type.Int {
.signedness = std.builtin.Signedness.unsigned,
.bits = bits,
},
});
}
pub fn main() !void {
const a: U(2) = 1;
const b: U(2) = 3;
// const c: U(2) = 5; // error: type 'u2' cannot represent integer value '5'
const d = std.math.maxInt(U(147));
std.debug.print("a={}, b={}, d={}", .{ a, b, d });
// a=1, b=3, d=178405961588244985132285746181186892047843327
}There was a problem hiding this comment.
I guess that example is satisfactory enough, @RalfJung? Not really sure if it's worth the effort to clarify explicitly.
There was a problem hiding this comment.
Ah, neat.
C and LLVM only have concrete-width integers though, I think?
There was a problem hiding this comment.
I mean, C doesn't have generic anything, so, I guess you're right. Not 100% sure the distinction is worth it.
There was a problem hiding this comment.
Clang adds _BitInt to C++ as an extension and the number of bits can be generic: template <size_t N> void example(_BitInt(N) a); will deduce N but it only works on the actual _BitInt types, not just any signed integer type.
|
I love this! One point that is touched upon here is aliases for I think that'd be super valuable to have. Rust already has a lot of symbols and being able to not use the angle brackets makes sure that the code is much calmer to look upon. It's also not the first explicit syntax sugar since an Having the aliases also allows for this while keeping everything consistent: fn foo<const N: usize>(my_num: u<N>) { ... }
foo(123); // What is the bit width? u32 by default?
foo(123u7); // Fixed it |
I agree with you, just didn't want to require them for the initial RFC, since I wanted to keef it simple. Ideally, the language will support |
hanna-kruppe
left a comment
There was a problem hiding this comment.
When the last RFC was postponed, the stated reason was waiting for pure library solutions to emerge and letting the experience with those inform the design. I don't really see much of this in the current RFC, so here's a bunch of questions about it. It would also be great if some non-obvious design aspects of the RFC (such as limits on N, whether and how post-monomorphization errors work, padding, alignment, etc.) could be justified with experience from such libraries.
|
|
||
| This was the main proposal last time this RFC rolled around, and as we've seen, it hasn't really worked. | ||
|
|
||
| Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems: |
There was a problem hiding this comment.
As far as I can tell, bounded-integer and intx only provide subsets of the native types up to {i,u}128, not arbitrarily large fixed-size integers. The u crate seems to be about something else entirely, did you mean to link something different there?
So where are the libraries that even try to do what this RFC proposes: arbitrary number of bits, driven by const generics? I've searched and found ruint, which appears relevant.
There was a problem hiding this comment.
That definitely seems like a good option to add to the list. I had trouble finding them, so, I appreciate it.
There was a problem hiding this comment.
I'd appreciate a mention of https://crates.io/crates/arbitrary-int, which is (I think) the closest in design to this rfc
|
|
||
| Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems: | ||
|
|
||
| * None of these libraries can easily unify with the existing `uN` and `iN` types. |
There was a problem hiding this comment.
A const-generic library type can't provide this and also can't support literals. But what problems exactly does that cause in practice? Which aspects can be handled well with existing language features and which ones really need language support?
There was a problem hiding this comment.
The RFC already mentions how being able to provide a small number of generic impls that cover all integer types has an extremely large benefit over being forced to use macros to implement for all of them individually. You cannot do this without language support.
There was a problem hiding this comment.
So this bullet point is "only" about impls like impl<const BITS: usize> Foo for some_library::Int<BITS> { ... } not implementing anything for the primitive integer types? Could From impls and some form of delegation (#3530) also help with this?
There was a problem hiding this comment.
Not really, and this is mentioned in the RFC also. That's 5 impls for unsigned, 5 impls for signed that could just be 2 impls, whether you have delegation or not. Even for simple traits, like Default, you're incentivised to use a macro just because it becomes so cumbersome.
There was a problem hiding this comment.
arbitrary-int provides a unification somewhat using its Number trait. It's somewhat rudimentary but I am working on improving it.
There was a problem hiding this comment.
Reading this again, the Number trait fulfills a somewhat different role though. It allows writing generic code against any Number (be it an arbitrary-int or a native int), but it does not expose the bits itself - which can be a plus or a minus, depending on what you're building.
| Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems: | ||
|
|
||
| * None of these libraries can easily unify with the existing `uN` and `iN` types. | ||
| * Generally, they require a lot of unsafe code to work. |
There was a problem hiding this comment.
What kind of unsafe code, and for what purposes? And is that sufficient reason to extend the language? Usually, if it's something that can be hidden behind a safe abstraction once and for all, then it seems secondary whether that unsafety lives on crates.io, in sysroot crates, or in the functional correctness of the compiler backend.
There was a problem hiding this comment.
Generally, the unsafe code is stuff similar to the bounded-integer crate, where integers are represented using enums and transmuted from primitives. The casting to primitives is safe, but not the transmuting.
There was a problem hiding this comment.
Is that really all? Because that seems trivial to encapsulate without affecting the API, and likely to be solved by any future feature that makes it easier to opt into niche optimizations (e.g., pattern types).
There was a problem hiding this comment.
Yeah, it's easy to encapsulate, but I think it's worth mentioning that unsafe code is involved as a negative because it means many code bases will be more apprehensive to use it.
You are right that it could easily be improved, though, with more compiler features. I just can't imagine it ever being on par with the performance of a compiler-supported version, both at runtime and compile time.
There was a problem hiding this comment.
arbitrary-int works without unsafe code (with the exception of the optional function new_unchecked which skips the bounds check)
There was a problem hiding this comment.
likely to be solved by any future feature that makes it easier to opt into niche optimizations (e.g., pattern types)
And this is already solved by pattern types in nightly: https://play.rust-lang.org/?version=nightly&mode=release&edition=2024&gist=583297ef353ad5e8e79907c06c3c197f
There was a problem hiding this comment.
I'm aware that pattern types exist on nightly. That's why I name-dropped them! I counted them under "future feature" for the purpose of this discussion because what's in nightly right now is light years away from something that can be stabilized, and it's not set in stone that whatever ends up stabilized will have the same name and general approach. That's why your example has the #![allow(internal_features)].
|
|
||
| * None of these libraries can easily unify with the existing `uN` and `iN` types. | ||
| * Generally, they require a lot of unsafe code to work. | ||
| * These representations tend to be slower and less-optimized than compiler-generated versions. |
There was a problem hiding this comment.
Do we have any data on what's slower and why? Are there any lower-stakes ways to fix these performance issues by, for example, adding/stabilizing suitable helper functions (like rust-lang/rust#85532) or adding more peephole optimizations in MIR and/or LLVM?
There was a problem hiding this comment.
Main source of slowdown is from using enums to take advantage of niche optimisations; having an enum with a large number of variants to represent this niche is pretty slow to compile, even though most of the resulting code ends up as no-ops after optimisations.
I definitely should mention that I meant slow to compile here, not slow to run. Any library solution can be made fast to run, but will generally suffer in compile time when these features are effectively already supported by the compiler backends, mostly for free.
There was a problem hiding this comment.
Is there any compile time issue when not trying to provide niches? Out of the potential use cases the RFC lists, only a couple seem to really care about niche optimizations. In particular, I don't expect that it typically matters for integers larger than 128 bits. (But again, surveying the real ecosystem would help!) If so, the compile time problem for crates like bounded-integer could be addressed more directly by stabilizing a proper way to directly opt into niches instead of having to abuse enums. And that would help with any bounds, while this RFC (without future possibilities) would not.
There was a problem hiding this comment.
Well, I would expect some negative compile-time impact from repeatedly monomorphizing code that's const-generics over bit width or bounds. But that's sort of inherent in having lots of code that is generic in this way, so it's no worse for third party libraries than for something built-in.
There was a problem hiding this comment.
That's very fair; I agree that we should have an ability to opt into niches regardless. I guess that my reasoning here is pretty lackluster because I felt that the other reasons to have this feature were strong enough that this argument wasn't worth arguing, although you're right that I should actually put a proper argument for it.
From what I've seen, of the use cases for generic integers:
- Generalising primitives
- Between-primitives integer types (like
u<7>andu<48>) - Larger-than-primitives integer types
For 1, basically no library solution can work, so, that's off the table. For 2, which is mostly the subject of discussion here, you're right that it could probably be improved a lot with existing support. And for 3, most people just don't find the need to make generalised code for their use cases, and just explicitly implement, say, u256 themselves with the few operations they need.
The main argument IMHO is that we can effectively knock out all three of these options easily with generic integers supported by the language, and they would be efficient and optimized by the compiler. We can definitely whittle down the issues with 2 and 3 as we add more support, but the point is that we don't need to if we add in generic integers.
Although, I really need to solidify this argument, because folks like you aren't 100% convinced, and I think that the feedback has been pretty valuable.
There was a problem hiding this comment.
Yeah, I appreciate that you're trying to tackle a lot of different problems with a unifying mechanism. I focus on each problem separately because I want to tease out how much value the unifying mechanism adds for each of them, compared to smaller, more incremental additions that may be useful and/or necessary in any case. Only when that's done I feel like I can form an opinion on whether this relatively large feature seems worth it overall.
| * None of these libraries can easily unify with the existing `uN` and `iN` types. | ||
| * Generally, they require a lot of unsafe code to work. | ||
| * These representations tend to be slower and less-optimized than compiler-generated versions. | ||
| * They still require you to generalise integer types with macros instead of const generics. |
There was a problem hiding this comment.
I'm not sure I understand the problem here. If a library provides struct Int<const BITS: usize>(...); then code using this library shouldn't need macros to interact with it (except, perhaps, as workaround for current gaps in const generics). The library itself would have a bunch of impls relating its types to the language primitives, which may be generated with macros. But that doesn't seem like such a drastic problem, if it's constrained to the innards of one library, or a few competing libraries.
There was a problem hiding this comment.
I'm not sure I understand your argument. No matter what, a library solution cannot be both generic and unify with the standard library types. I don't see a path forward that would allow, for example, some library Uint<N> type to allow Uint<8> being an alias for u8 while also supporting arbitrary Uint<N>. Even with specialisation, I can't imagine a sound subset of specialisation allowing this to work.
Like, sure, a set of libraries can choose to only use these types instead of the primitives, circumventing the problem. But most people will want to implement their traits for primitives for interoperability.
There was a problem hiding this comment.
This overlaps a bit with the bullet point about unification, but I do think it depends a lot on what one is doing. For example, the num-traits crate defines traits that it needs to implement for the primitive types. On the other hand, any code that's currently written against the traits from num-traits may be happy with a third party library that provides Int<N> and Uint<N> and implements the relevant traits for them. And for something like bit fields, you may not need much generalization over primitive types at all: in the MipsInstruction example, you probably want some widening and narrowing conversions, but only with respect to u32 specifically.
It's hard to form an opinion about how common these scenarios are (and whether there are other nuances) without having a corpus of "real" code to look at. Experience reports (including negative ones) with crates like num-traits and bounded-integer may be more useful than discussing it in the abstract.
|
Two things that came to mind:
|
So, I agree that this was one of the reasons, but it's worth reiterating that also, at that time, const generics weren't even stable. We had no idea what the larger ecosystem would choose to do with them, considering how many people were waiting for stabilisation to really start using them. (We had an idea of what was possible, but not what would feel most ergonomic for APIs, etc.) So, I personally felt that the library solution idea was mostly due to that fact that we didn't really know what libraries would do with const generics. And, overwhelmingly, there hasn't been much interest in it for what I believe to be the most compelling use case: generalising APIs without using macros, which right now cannot really be done without language support. |
|
Currently Rust doesn't obey property "every expression has (possibly polymorphic) type, which can be deduced from expression itself alone". Rust doesn't obey it, because integer literals don't have any intrinsic type. Type of integer literals is present in compiler error messages as Such situation is different from many other languages, for example, Haskell. In Haskell integer literals do have intrinsic type, namely We can fix this so: #![feature(adt_const_params)]
#[derive(ConstParamTy, PartialEq, Eq)]
enum IntegerDescriptor {
U(usize),
I(usize),
USize,
ISize,
}Now let's introduce universal integer type |
|
|
||
| ## Enum variants | ||
|
|
||
| For now, enum variants will still be restricted to their current set of integer types, since even [`repr(u128)`] isn't stable yet. If an RFC like [#3659] gets passed, allowing arbitrary types for enum variant tags, then `u<N>` should be included in that, although that can be added as a future extension. |
There was a problem hiding this comment.
repr128 is stabilized as of 1.89.0, so this paragraph should change.
There was a problem hiding this comment.
I actually both knew about this and noticed it when I reread the proposal myself due to the other comments, but I hadn't actually updated it since this is relatively low priority. I'll mark this as resolved once I do update it, though.
Note that the rest of the statement still remains: we'd need some sort of decision on arbitrary-type enum reprs, since it's not entirely clear what repr(u<7>) would mean. We don't allow repr(char) either, for example, even though such a thing could be properly defined if we wanted to do so.
There was a problem hiding this comment.
Basically all the proposals for a unified int type have taken some form of this, and that's what I alluded to in the RFC by mentioning that |
|
is there a tracking issue for this? |
|
There aren't tracking issues for unaccepted RFCs. Presumably, this could probably be proposed as some kind of compiler experiment, but that would require a separate MCP. |
|
Personally I'd be happy to see a lang+compiler experiment towards this, if someone wants to sign up to work on it. |
|
I've been thinking of potentially offering some help implementing this (at least, partially via things like unbounding the length of literals and refactoring miri to use big-integers instead of u128), but haven't done any compiler work so far so it would be a bit of a learning curve. If folks would be willing to help out with that, I'd be more than happy to write up an MCP for it! |
|
We had a quick conversation about this in the lang meeting today, and people were all receptive to a lang experiment. There were particular discussions about hoping the experiment could show whether the |
|
Since you're wearing both hats, does the compiler team think it would be worth filing any MCPs over the specifics of the implementation (like refactoring things to use bigints instead of u128s/i128s) or is the lang go-ahead enough to just start writing code for this? |
|
I'd recommend to open a compiler MCP with an implementation strategy if it involves significant changes to the compiler (which I think it does). It doesn't have to get bogged down by all the details but the high level plan and its implications are probably worth discussing. |
|
I'd think carefully about what you'd hope the experiment would want to accomplish, and how you could do that with the least churn to the compiler. From a compiler reviewer perspective, I'd want to see things that would be easy to undo later if it turned out that the experiment doesn't work for some reason. For example, maybe it'd be enough to have the experiment work only for Then if the experiment succeeds and the RFC is accepted, that'd still be a useful subset (letting blanket impls work over everything up to 128-bit and thus covering all the things that currently exist) and the next phase of work of expanding beyond 128-bit could be subsequent work once it's confirmed as "yes, we want to do this". (I think the answer to "could we go through and replace all the |
|
|
||
| Note that casting directly to `u<0>` or `i<0>` is still allowed, to avoid forcing users to special-case their code. See later notes on possible lints. | ||
|
|
||
| Integer literals can be automatically coerced to `u<N>` and `i<N>`, although generic `iN` and `uN` suffixes are left out for future RFCs. When coercing literals to an explicit `u<N>` or `i<N>`, the `overflowing_literals` lint should trigger as usual, although this should not apply for generic code. See later notes on possible lint changes. |
There was a problem hiding this comment.
Question: Since integer literals have an assumed type of i32, would it make sense for N to have a default value of 32, making i the assumed type of an integer literal?
There was a problem hiding this comment.
I don't think so, since the assumptions on integer literals are a bit more complicated than that. I believe that if the literal can't fit within i32, it gets bumped up to i64 or i128.
There was a problem hiding this comment.
I believe that if the literal can't fit within
i32, it gets bumped up toi64ori128.
Nope, Rust just errors, it doesn't account for how big the literal's value is when deciding which type to use: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6500fd891b41b51e4dcdd37b26b93c5c
There was a problem hiding this comment.
Interesting! Since that's the case, I think I like the idea of making N = 32 a default. That makes it clear that the default integer type is i32. To my knowledge, that default isn't listed in the documentation for core/std.
There was a problem hiding this comment.
That would mean
let x: i = 42;is valid, which reads a bit weird.
There was a problem hiding this comment.
that default isn't listed in the documentation for
core/std.
it is documented here: https://doc.rust-lang.org/1.93.1/reference/expressions/literal-expr.html#r-expr.literal.int.inference-default
There was a problem hiding this comment.
TBH I think (rust-lang/rust#152589) that relying on the i32 default is a mis-feature anyway outside of beginner examples like wanting println!("{}", 3); to work.
If someone is writing out a type, they should say i32 if they want i32.
(And, in particular, i is a common iteration variable, so I never want to see something like let mut i: i = 0;.)
There was a problem hiding this comment.
Yeah, if anything, I would expect i as a bare type to be equivalent to i<_>, where the actual width is inferred.
Which, I guess, is another big question that the lang experiment will have to resolve: how exactly is that going to work? Will it just fail catastrophically like it does now, or default to 32?
There was a problem hiding this comment.
Will it just fail catastrophically like it does now, or default to 32?
I'd assume literals will default to i32 for compatibility, if the literal is too big then that will trigger the deny-by-default overflowing literals lint. if the lint is allowed, then it will just wrap the value to fit in the literal's deduced type, just like it does now.
There was a problem hiding this comment.
I mean, I'm certain that integer inference is tricky and there's a reason for the current behaviour, but I do wonder if Rust's changes over the past decade plus the addition of const generic inference is enough to improve the current behaviour without too much extra work.
Co-authored-by: Zachary Harrold <zac@harrold.com.au>
|
On the topic of names: With this in mind, I propose the following type names: use std::num::Unsigned;
impl<const N: usize, const M: usize> Shl<Unsigned<M>> for Unsigned<N> {
type Output = Unsigned<N>;
#[inline]
fn shl(self, rhs: Unsigned<M>) -> Unsigned<N> {
// implementation
}
} |
|
I agree that |
|
Note that types and variables live in different namespaces. A variable |
In the same way that
I'm more concerned about confusing humans than compilers. |
|
Yeah, this was explicitly discussed before and the type namespacing being different was the main motivating factor. It's already mentioned under the alternatives and explained there. Signed and Unsigned also obviously conflict with a lot of library traits, which are in the type namespace, so that seems like a non-starter. |
I specifically suggested these be in I personally think generic integers and/or integer traits would be very useful, but mainly for small sections of generic code where it is completely fine to use long names.
I saw mention of |
No. Those are library types that are newtypes of u32 and i32 respectively. u32 doesn't have any definition in the library, and I think the same would be true of |
Summary
Adds the builtin types
u<N>andi<N>, allowing integers with an arbitrary size in bits.Rendered
Details
This is a follow-up to #2581, which was previously postponed. A lot has happened since then, and there has been general support for this change from a lot of different people. It's time.
There are a few key differences from the previous RFC, but I trust that you can read.
Thanks
Thank you to everyone who responded to the pre-RFC on Internals with feedback.