Skip to content

Generic Integers V2: It's Time#3686

Open
clarfonthey wants to merge 14 commits intorust-lang:masterfrom
clarfonthey:generic-integers-v2
Open

Generic Integers V2: It's Time#3686
clarfonthey wants to merge 14 commits intorust-lang:masterfrom
clarfonthey:generic-integers-v2

Conversation

@clarfonthey
Copy link

@clarfonthey clarfonthey commented Sep 1, 2024

Summary

Adds the builtin types u<N> and i<N>, allowing integers with an arbitrary size in bits.

Rendered

Details

This is a follow-up to #2581, which was previously postponed. A lot has happened since then, and there has been general support for this change from a lot of different people. It's time.

There are a few key differences from the previous RFC, but I trust that you can read.

Thanks

Thank you to everyone who responded to the pre-RFC on Internals with feedback.

This reverts commit 25f85cc105cb04b4e87debf46f4547240c122ae4.
@ehuss ehuss added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-types Relevant to the types team, which will review and decide on the RFC. labels Sep 2, 2024
@jhpratt
Copy link
Member

jhpratt commented Sep 5, 2024

As much as I dislike as casts and would prefer a better solution, until that solution exists (in the standard library), this is probably the best way to go.

👍 from me

@Alonely0
Copy link

Alonely0 commented Sep 5, 2024

Even if we should probably leave them out of the initial RFC for complexity reasons, I would just cheat with floats, as they rely on system libraries and hardware instructions way more than regular integers. By that, I mean that I'd allow f<32> for consistency reasons, but only those that are actually supported would compile; i.e., f<3> would throw a compile-time error (it could either be done at monomorphisation time, or disallowing const generics on that one). Overall, I think this RFC is in the right track, but I'd postpone it until we're past Rust 2024.

@matthieu-m
Copy link

Overall, I think this RFC is in the right track, but I'd postpone it until we're past Rust 2024.

Are you proposing delaying the discussion or the implementation?

My understanding is that with a release early 2025, Rust 2024 will be done by mid November, which is only 2 months away, and it seems quite unlikely this RFC would be accepted and implementation ready to start by then, so I see no conflict with regard to starting on the implementation...

... but I could understand a focus on the edition for the next 2 months, and thus less bandwidth available for discussing RFCs.

@clarfonthey
Copy link
Author

clarfonthey commented Sep 5, 2024

Even if we should probably leave them out of the initial RFC for complexity reasons, I would just cheat with floats, as they rely on system libraries and hardware instructions way more than regular integers. By that, I mean that I'd allow f<32> for consistency reasons, but only those that are actually supported would compile; i.e., f<3> would throw a compile-time error (it could either be done at monomorphisation time, or disallowing const generics on that one).

The problem with this approach is that any "cheating" becomes permanently stabilised, and thus, it's worth putting in some thought for the design. This isn't to say that f<N> is a bad design (I personally don't like it, but I won't fault people for wanting to use it), but rather that u<N> and i<N> are good designs in several ways that f<N> is not.

Plus, monomorphisation-time errors were actually one of the big downsides to the original RFC, and I suspect that people haven't really changed their thoughts since then. Effectively, while it's okay to allow some of edge-case monomorphisation-time errors like this RFC includes (for example, asking for u<0xFFFF_FFFF_FFFF> is a hard error, since it's larger than u32::MAX), but not extremely-common errors like just asking for f<N> where N is anything that isn't 16, 32, 64, or 128.

One potential solution that was proposed for unifying u<N>, i<N>, usize, and isize was to have some separate ADT that encapsulates signedness and has different options for "size" and N. This kind of solution feels promising for generic floats since it means that you could have an impl like:

impl<const F: FloatKind> MyTrait for f<F> {
    // ...
}

And it would support all float types, forever, and there would be no invalid values for F since we've explicitly defined it. However, this requires const_adt_params which is currently unstable.

Overall, I think this RFC is in the right track, but I'd postpone it until we're past Rust 2024.

As stated: yes, RFCs take time to discuss and implement and it's very reasonable to expect people to focus on the 2024 edition for now. However, that doesn't mean that we can't discuss this now, especially since there are bound to be things that were missed that would be good to point out.


In general, operations on `u<N>` and `i<N>` should work the same as they do for existing integer types, although the compiler may need to special-case `N = 0` and `N = 1` if they're not supported by the backend.

When stored, `u<N>` should always zero-extend to the size of the type and `i<N>` should always sign-extend. This means that any padding bits for `u<N>` can be expected to be zero, but padding bits for `i<N>` may be either all-zero or all-one depending on the sign.
Copy link
Member

@RalfJung RalfJung Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify this to say what exactly happens when I transmute e.g. 255u8 to u<7> (and similar to i<N>). I assume it is UB, i.e., the validity invariant of these types says that the remaining bits are zero-extended / sign-extended, but the RFC should make that explicit.

Note that calling this "padding" might be confusing since "padding" in structs is uninitialized, but here padding would be defined to always have very specific values. (That would, e.g. allow, it to be used as a niche for enum optimizations.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not quite sure what a better name is; it's the same as rustc_layout_scalar_valid_range, which is UB if the bits are invalid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that since this is the reference description, calling them niche bits would be more appropriate? Would that feel reasonable?

Copy link
Member

@RalfJung RalfJung Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Niche bits are an implementation detail of the enum layout algorithm, and mostly not stable nor documented.

Just describe what the valid representations of values of these type are, i.e., what should go into this section about these types.


The compiler should be allowed to restrict `N` even further, maybe even as low as `u16::MAX`, due to other restrictions that may apply. For example, the LLVM backend currently only allows integers with widths up to `u<23>::MAX` (not a typo; 23, not 32). On 16-bit targets, using `usize` further restricts these integers to `u16::MAX` bits.

While `N` could be a `u32` instead of `usize`, keeping it at `usize` makes things slightly more natural when converting bits to array lengths and other length-generics, and these quite high cutoff points are seen as acceptable. In particular, this helps using `N` for an array index until [`generic_const_exprs`] is stabilized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean "using N for an array length", I assume?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


As an example, someone might end up using `u<7>` for a percent since it allows fewer extraneous values (`101..=127`) than `u<8>` (`101..=255`), although this actually just overcomplicates the code for little benefit, and may even make the performance worse.

Overall, things have changed dramatically since [the last time this RFC was submitted][#2581]. Back then, const generics weren't even implemented in the compiler yet, but now, they're used throughout the Rust ecosystem. Additionally, it's clear that LLVM definitely supports generic integers to a reasonable extent, and languages like [Zig] and even [C][`_BitInt`] have implemented them. A lot of people think it's time to start considering them for real.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say Zig has generic integers, it seems like they have arbitrarily-sized integers. Or is it possible to write code that is generic over the integer size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well actually you can

const std = @import("std");

fn U(comptime bits: u16) type {
    return @Type(std.builtin.Type {
        .Int = std.builtin.Type.Int {
            .signedness = std.builtin.Signedness.unsigned,
            .bits = bits,
        },
    });
}

pub fn main() !void {
    const a: U(2) = 1;
    const b: U(2) = 3;
    // const c: U(2) = 5; // error: type 'u2' cannot represent integer value '5'
    const d = std.math.maxInt(U(147));
    std.debug.print("a={}, b={}, d={}", .{ a, b, d });
    // a=1, b=3, d=178405961588244985132285746181186892047843327
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that example is satisfactory enough, @RalfJung? Not really sure if it's worth the effort to clarify explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, neat.

C and LLVM only have concrete-width integers though, I think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, C doesn't have generic anything, so, I guess you're right. Not 100% sure the distinction is worth it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang adds _BitInt to C++ as an extension and the number of bits can be generic: template <size_t N> void example(_BitInt(N) a); will deduce N but it only works on the actual _BitInt types, not just any signed integer type.

@diondokter
Copy link

diondokter commented Sep 6, 2024

I love this!

One point that is touched upon here is aliases for uN <=> u<N>.

I think that'd be super valuable to have. Rust already has a lot of symbols and being able to not use the angle brackets makes sure that the code is much calmer to look upon. It's also not the first explicit syntax sugar since an async fn is treated the same as fn -> impl Future in a lot of places.

Having the aliases also allows for this while keeping everything consistent:

fn foo<const N: usize>(my_num: u<N>) { ... }

foo(123); // What is the bit width? u32 by default?
foo(123u7); // Fixed it

@clarfonthey
Copy link
Author

I love this!

One point that is touched upon here is aliases for uN <=> u<N>.

I think that'd be super valuable to have. Rust already has a lot of symbols and being able to not use the angle brackets makes sure that the code is much calmer to look upon. It's also not the first explicit syntax sugar since an async fn is treated the same as fn -> impl Future in a lot of places.

I agree with you, just didn't want to require them for the initial RFC, since I wanted to keef it simple. Ideally, the language will support uN aliases as well as uN suffixes.

Copy link

@hanna-kruppe hanna-kruppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the last RFC was postponed, the stated reason was waiting for pure library solutions to emerge and letting the experience with those inform the design. I don't really see much of this in the current RFC, so here's a bunch of questions about it. It would also be great if some non-obvious design aspects of the RFC (such as limits on N, whether and how post-monomorphization errors work, padding, alignment, etc.) could be justified with experience from such libraries.


This was the main proposal last time this RFC rolled around, and as we've seen, it hasn't really worked.

Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, bounded-integer and intx only provide subsets of the native types up to {i,u}128, not arbitrarily large fixed-size integers. The u crate seems to be about something else entirely, did you mean to link something different there?

So where are the libraries that even try to do what this RFC proposes: arbitrary number of bits, driven by const generics? I've searched and found ruint, which appears relevant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That definitely seems like a good option to add to the list. I had trouble finding them, so, I appreciate it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a mention of https://crates.io/crates/arbitrary-int, which is (I think) the closest in design to this rfc


Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems:

* None of these libraries can easily unify with the existing `uN` and `iN` types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A const-generic library type can't provide this and also can't support literals. But what problems exactly does that cause in practice? Which aspects can be handled well with existing language features and which ones really need language support?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC already mentions how being able to provide a small number of generic impls that cover all integer types has an extremely large benefit over being forced to use macros to implement for all of them individually. You cannot do this without language support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this bullet point is "only" about impls like impl<const BITS: usize> Foo for some_library::Int<BITS> { ... } not implementing anything for the primitive integer types? Could From impls and some form of delegation (#3530) also help with this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, and this is mentioned in the RFC also. That's 5 impls for unsigned, 5 impls for signed that could just be 2 impls, whether you have delegation or not. Even for simple traits, like Default, you're incentivised to use a macro just because it becomes so cumbersome.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrary-int provides a unification somewhat using its Number trait. It's somewhat rudimentary but I am working on improving it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this again, the Number trait fulfills a somewhat different role though. It allows writing generic code against any Number (be it an arbitrary-int or a native int), but it does not expose the bits itself - which can be a plus or a minus, depending on what you're building.

Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems:

* None of these libraries can easily unify with the existing `uN` and `iN` types.
* Generally, they require a lot of unsafe code to work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of unsafe code, and for what purposes? And is that sufficient reason to extend the language? Usually, if it's something that can be hidden behind a safe abstraction once and for all, then it seems secondary whether that unsafety lives on crates.io, in sysroot crates, or in the functional correctness of the compiler backend.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the unsafe code is stuff similar to the bounded-integer crate, where integers are represented using enums and transmuted from primitives. The casting to primitives is safe, but not the transmuting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really all? Because that seems trivial to encapsulate without affecting the API, and likely to be solved by any future feature that makes it easier to opt into niche optimizations (e.g., pattern types).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's easy to encapsulate, but I think it's worth mentioning that unsafe code is involved as a negative because it means many code bases will be more apprehensive to use it.

You are right that it could easily be improved, though, with more compiler features. I just can't imagine it ever being on par with the performance of a compiler-supported version, both at runtime and compile time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrary-int works without unsafe code (with the exception of the optional function new_unchecked which skips the bounds check)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarfonthey
@hanna-kruppe

likely to be solved by any future feature that makes it easier to opt into niche optimizations (e.g., pattern types)

And this is already solved by pattern types in nightly: https://play.rust-lang.org/?version=nightly&mode=release&edition=2024&gist=583297ef353ad5e8e79907c06c3c197f

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware that pattern types exist on nightly. That's why I name-dropped them! I counted them under "future feature" for the purpose of this discussion because what's in nightly right now is light years away from something that can be stabilized, and it's not set in stone that whatever ends up stabilized will have the same name and general approach. That's why your example has the #![allow(internal_features)].


* None of these libraries can easily unify with the existing `uN` and `iN` types.
* Generally, they require a lot of unsafe code to work.
* These representations tend to be slower and less-optimized than compiler-generated versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any data on what's slower and why? Are there any lower-stakes ways to fix these performance issues by, for example, adding/stabilizing suitable helper functions (like rust-lang/rust#85532) or adding more peephole optimizations in MIR and/or LLVM?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main source of slowdown is from using enums to take advantage of niche optimisations; having an enum with a large number of variants to represent this niche is pretty slow to compile, even though most of the resulting code ends up as no-ops after optimisations.

I definitely should mention that I meant slow to compile here, not slow to run. Any library solution can be made fast to run, but will generally suffer in compile time when these features are effectively already supported by the compiler backends, mostly for free.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any compile time issue when not trying to provide niches? Out of the potential use cases the RFC lists, only a couple seem to really care about niche optimizations. In particular, I don't expect that it typically matters for integers larger than 128 bits. (But again, surveying the real ecosystem would help!) If so, the compile time problem for crates like bounded-integer could be addressed more directly by stabilizing a proper way to directly opt into niches instead of having to abuse enums. And that would help with any bounds, while this RFC (without future possibilities) would not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would expect some negative compile-time impact from repeatedly monomorphizing code that's const-generics over bit width or bounds. But that's sort of inherent in having lots of code that is generic in this way, so it's no worse for third party libraries than for something built-in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very fair; I agree that we should have an ability to opt into niches regardless. I guess that my reasoning here is pretty lackluster because I felt that the other reasons to have this feature were strong enough that this argument wasn't worth arguing, although you're right that I should actually put a proper argument for it.

From what I've seen, of the use cases for generic integers:

  1. Generalising primitives
  2. Between-primitives integer types (like u<7> and u<48>)
  3. Larger-than-primitives integer types

For 1, basically no library solution can work, so, that's off the table. For 2, which is mostly the subject of discussion here, you're right that it could probably be improved a lot with existing support. And for 3, most people just don't find the need to make generalised code for their use cases, and just explicitly implement, say, u256 themselves with the few operations they need.

The main argument IMHO is that we can effectively knock out all three of these options easily with generic integers supported by the language, and they would be efficient and optimized by the compiler. We can definitely whittle down the issues with 2 and 3 as we add more support, but the point is that we don't need to if we add in generic integers.

Although, I really need to solidify this argument, because folks like you aren't 100% convinced, and I think that the feedback has been pretty valuable.

Copy link

@hanna-kruppe hanna-kruppe Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I appreciate that you're trying to tackle a lot of different problems with a unifying mechanism. I focus on each problem separately because I want to tease out how much value the unifying mechanism adds for each of them, compared to smaller, more incremental additions that may be useful and/or necessary in any case. Only when that's done I feel like I can form an opinion on whether this relatively large feature seems worth it overall.

* None of these libraries can easily unify with the existing `uN` and `iN` types.
* Generally, they require a lot of unsafe code to work.
* These representations tend to be slower and less-optimized than compiler-generated versions.
* They still require you to generalise integer types with macros instead of const generics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the problem here. If a library provides struct Int<const BITS: usize>(...); then code using this library shouldn't need macros to interact with it (except, perhaps, as workaround for current gaps in const generics). The library itself would have a bunch of impls relating its types to the language primitives, which may be generated with macros. But that doesn't seem like such a drastic problem, if it's constrained to the innards of one library, or a few competing libraries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your argument. No matter what, a library solution cannot be both generic and unify with the standard library types. I don't see a path forward that would allow, for example, some library Uint<N> type to allow Uint<8> being an alias for u8 while also supporting arbitrary Uint<N>. Even with specialisation, I can't imagine a sound subset of specialisation allowing this to work.

Like, sure, a set of libraries can choose to only use these types instead of the primitives, circumventing the problem. But most people will want to implement their traits for primitives for interoperability.

Copy link

@hanna-kruppe hanna-kruppe Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps a bit with the bullet point about unification, but I do think it depends a lot on what one is doing. For example, the num-traits crate defines traits that it needs to implement for the primitive types. On the other hand, any code that's currently written against the traits from num-traits may be happy with a third party library that provides Int<N> and Uint<N> and implements the relevant traits for them. And for something like bit fields, you may not need much generalization over primitive types at all: in the MipsInstruction example, you probably want some widening and narrowing conversions, but only with respect to u32 specifically.

It's hard to form an opinion about how common these scenarios are (and whether there are other nuances) without having a corpus of "real" code to look at. Experience reports (including negative ones) with crates like num-traits and bounded-integer may be more useful than discussing it in the abstract.

@Diggsey
Copy link
Contributor

Diggsey commented Sep 7, 2024

Two things that came to mind:

  1. Are there any issues with the self-referentiality of these types? Although usize is a distinct type, one could easily imagine wanting to make it a "new-type wrapper" around the appropriate integer type, which would make a circular dependency between the two implementations. We could say that usize is not implemented that way, but then it's surprising to me that usize would be the "foundation" rather than the other way around.
  2. Even though LLVM can express integers of arbitrary size, it seems unlikely that these types have seen extensive use with unusual sizes. Maybe these integer types should be lowered to common integer types within rustc, so that backends can be simplified.

@clarfonthey
Copy link
Author

When the last RFC was postponed, the stated reason was waiting for pure library solutions to emerge and letting the experience with those inform the design. I don't really see much of this in the current RFC, so here's a bunch of questions about it. It would also be great if some non-obvious design aspects of the RFC (such as limits on N, whether and how post-monomorphization errors work, padding, alignment, etc.) could be justified with experience from such libraries.

So, I agree that this was one of the reasons, but it's worth reiterating that also, at that time, const generics weren't even stable. We had no idea what the larger ecosystem would choose to do with them, considering how many people were waiting for stabilisation to really start using them. (We had an idea of what was possible, but not what would feel most ergonomic for APIs, etc.)

So, I personally felt that the library solution idea was mostly due to that fact that we didn't really know what libraries would do with const generics. And, overwhelmingly, there hasn't been much interest in it for what I believe to be the most compelling use case: generalising APIs without using macros, which right now cannot really be done without language support.

@safinaskar
Copy link

Currently Rust doesn't obey property "every expression has (possibly polymorphic) type, which can be deduced from expression itself alone". Rust doesn't obey it, because integer literals don't have any intrinsic type. Type of integer literals is present in compiler error messages as {integer}.

Such situation is different from many other languages, for example, Haskell. In Haskell integer literals do have intrinsic type, namely (Num a) => a.

We can fix this so:

#![feature(adt_const_params)]

#[derive(ConstParamTy, PartialEq, Eq)]
enum IntegerDescriptor {
  U(usize),
  I(usize),
  USize,
  ISize,
}

Now let's introduce universal integer type int<const desc: IntegerDescriptor>. For example, u8 will be equivalent to int<U(8)>. Then integer literals will have intrinsic type: int<_>


## Enum variants

For now, enum variants will still be restricted to their current set of integer types, since even [`repr(u128)`] isn't stable yet. If an RFC like [#3659] gets passed, allowing arbitrary types for enum variant tags, then `u<N>` should be included in that, although that can be added as a future extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repr128 is stabilized as of 1.89.0, so this paragraph should change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually both knew about this and noticed it when I reread the proposal myself due to the other comments, but I hadn't actually updated it since this is relatively low priority. I'll mark this as resolved once I do update it, though.

Note that the rest of the statement still remains: we'd need some sort of decision on arbitrary-type enum reprs, since it's not entirely clear what repr(u<7>) would mean. We don't allow repr(char) either, for example, even though such a thing could be properly defined if we wanted to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarfonthey
Copy link
Author

Currently Rust doesn't obey property "every expression has (possibly polymorphic) type, which can be deduced from expression itself alone". Rust doesn't obey it, because integer literals don't have any intrinsic type. Type of integer literals is present in compiler error messages as {integer}.

Such situation is different from many other languages, for example, Haskell. In Haskell integer literals do have intrinsic type, namely (Num a) => a.

We can fix this so:

#![feature(adt_const_params)]

#[derive(ConstParamTy, PartialEq, Eq)]
enum IntegerDescriptor {
  U(usize),
  I(usize),
  USize,
  ISize,
}

Now let's introduce universal integer type int<const desc: IntegerDescriptor>. For example, u8 will be equivalent to int<U(8)>. Then integer literals will have intrinsic type: int<_>

Basically all the proposals for a unified int type have taken some form of this, and that's what I alluded to in the RFC by mentioning that adt_const_params is required to allow this. Basically, my point is that we can make this change in a backwards-compatible way, we'll probably still want the i<N> and u<N> aliases anyway, and we don't fully know what adt_const_params will look like since it's not stable yet, so, we can push that off as a future extension.

@Apersoma
Copy link

is there a tracking issue for this?

@clarfonthey
Copy link
Author

There aren't tracking issues for unaccepted RFCs. Presumably, this could probably be proposed as some kind of compiler experiment, but that would require a separate MCP.

@scottmcm
Copy link
Member

Personally I'd be happy to see a lang+compiler experiment towards this, if someone wants to sign up to work on it.

@clarfonthey
Copy link
Author

I've been thinking of potentially offering some help implementing this (at least, partially via things like unbounding the length of literals and refactoring miri to use big-integers instead of u128), but haven't done any compiler work so far so it would be a bit of a learning curve.

If folks would be willing to help out with that, I'd be more than happy to write up an MCP for it!

@scottmcm scottmcm added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Feb 16, 2026
@traviscross traviscross added the P-lang-drag-1 Lang team prioritization drag level 1. label Feb 18, 2026
@scottmcm
Copy link
Member

We had a quick conversation about this in the lang meeting today, and people were all receptive to a lang experiment.

There were particular discussions about hoping the experiment could show whether the type u32 = u<32>; can work out, for example. (We also talked a bunch about various design and implementation implications, then managed to stop ourselves and go back to "yes, this is why an experiment would be great".)

@scottmcm scottmcm added I-lang-radar Items that are on lang's radar and will need eventual work or consideration. and removed I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. P-lang-drag-1 Lang team prioritization drag level 1. labels Feb 18, 2026
@clarfonthey
Copy link
Author

Since you're wearing both hats, does the compiler team think it would be worth filing any MCPs over the specifics of the implementation (like refactoring things to use bigints instead of u128s/i128s) or is the lang go-ahead enough to just start writing code for this?

@Noratrieb
Copy link
Member

I'd recommend to open a compiler MCP with an implementation strategy if it involves significant changes to the compiler (which I think it does). It doesn't have to get bogged down by all the details but the high level plan and its implications are probably worth discussing.

@scottmcm
Copy link
Member

I'd think carefully about what you'd hope the experiment would want to accomplish, and how you could do that with the least churn to the compiler. From a compiler reviewer perspective, I'd want to see things that would be easy to undo later if it turned out that the experiment doesn't work for some reason.

For example, maybe it'd be enough to have the experiment work only for u<N> where N is 0..=128 and i<N> where N is 1..=128. That would let you avoid changing all the types all over the place (from WrappingRange to CTFE Scalar to TagEncoding::Niche::niche_start to SwitchTargets) that are currently u128, but might be enough to demonstrate that the multi-width approach works well. It could show the blanket stuff, the coherence rules, that type u32 = u<32>; will work, etc.

Then if the experiment succeeds and the RFC is accepted, that'd still be a useful subset (letting blanket impls work over everything up to 128-bit and thus covering all the things that currently exist) and the next phase of work of expanding beyond 128-bit could be subsequent work once it's confirmed as "yes, we want to do this".

(I think the answer to "could we go through and replace all the u128s with ocaml-style boxed-when-big integers" is "yes, we have confidence we could do that, it'd just be lots of work", so to me the experiment would be better off concentrating on other things. Like what does ty::Uint look like and how does that impact everywhere consuming it? Is there a way that we can make it actually more convenient to consume because there's a bits: u16 field in the type, say, instead of needing to match the exact variants everywhere? How would things work out for the standard library to define things on these new types? How is the type solver going to treat having so many more integer types? Etc.)


Note that casting directly to `u<0>` or `i<0>` is still allowed, to avoid forcing users to special-case their code. See later notes on possible lints.

Integer literals can be automatically coerced to `u<N>` and `i<N>`, although generic `iN` and `uN` suffixes are left out for future RFCs. When coercing literals to an explicit `u<N>` or `i<N>`, the `overflowing_literals` lint should trigger as usual, although this should not apply for generic code. See later notes on possible lint changes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Since integer literals have an assumed type of i32, would it make sense for N to have a default value of 32, making i the assumed type of an integer literal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, since the assumptions on integer literals are a bit more complicated than that. I believe that if the literal can't fit within i32, it gets bumped up to i64 or i128.

Copy link
Member

@programmerjake programmerjake Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if the literal can't fit within i32, it gets bumped up to i64 or i128.

Nope, Rust just errors, it doesn't account for how big the literal's value is when deciding which type to use: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6500fd891b41b51e4dcdd37b26b93c5c

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Since that's the case, I think I like the idea of making N = 32 a default. That makes it clear that the default integer type is i32. To my knowledge, that default isn't listed in the documentation for core/std.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean

let x: i = 42;

is valid, which reads a bit weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that default isn't listed in the documentation for core/std.

it is documented here: https://doc.rust-lang.org/1.93.1/reference/expressions/literal-expr.html#r-expr.literal.int.inference-default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I think (rust-lang/rust#152589) that relying on the i32 default is a mis-feature anyway outside of beginner examples like wanting println!("{}", 3); to work.

If someone is writing out a type, they should say i32 if they want i32.

(And, in particular, i is a common iteration variable, so I never want to see something like let mut i: i = 0;.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if anything, I would expect i as a bare type to be equivalent to i<_>, where the actual width is inferred.

Which, I guess, is another big question that the lang experiment will have to resolve: how exactly is that going to work? Will it just fail catastrophically like it does now, or default to 32?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it just fail catastrophically like it does now, or default to 32?

I'd assume literals will default to i32 for compatibility, if the literal is too big then that will trigger the deny-by-default overflowing literals lint. if the lint is allowed, then it will just wrap the value to fit in the literal's deduced type, just like it does now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I'm certain that integer inference is tricky and there's a reason for the current behaviour, but I do wonder if Rust's changes over the past decade plus the addition of const generic inference is enough to improve the current behaviour without too much extra work.

Co-authored-by: Zachary Harrold <zac@harrold.com.au>
@dhardy
Copy link
Contributor

dhardy commented Feb 20, 2026

On the topic of names: i32, u64 etc. are used as type-names as a convention and are a special exception to the normal type names (start with a capital letter, be a descriptive word). While there is decent justification for this special exception, I do not think this extends to usage of i and u as generic type names since (1) they are too ambiguous, (2) i is already a common variable name and (3) generic integers likely won't be used nearly as frequently as the existing integer types.

With this in mind, I propose the following type names: std::num::Signed, std::num::Unsigned.

use std::num::Unsigned;

impl<const N: usize, const M: usize> Shl<Unsigned<M>> for Unsigned<N> {
    type Output = Unsigned<N>;
    #[inline]
    fn shl(self, rhs: Unsigned<M>) -> Unsigned<N> {
        // implementation
    }
}

@tmccombs
Copy link

I agree that i and u are ambiguous and confliect with common variable names (especially u), but the convention is that special built-in/primitive types have lower case names, not just the integer types, but also bool and str. And these types would still be primitives, so I think it makes sense for them to be lowercase. But maybe with longer names like int<N> and uint<N>?

@jhpratt
Copy link
Member

jhpratt commented Feb 20, 2026

Note that types and variables live in different namespaces. A variable i in no way conflicts with a type i, built-in or otherwise.

@dhardy
Copy link
Contributor

dhardy commented Feb 20, 2026

And these types would still be primitives

In the same way that std::num::NonZeroU32 and std::num::Wrapping<i32> are primitives?

Note that types and variables live in different namespaces.

I'm more concerned about confusing humans than compilers.

@clarfonthey
Copy link
Author

Yeah, this was explicitly discussed before and the type namespacing being different was the main motivating factor. It's already mentioned under the alternatives and explained there.

Signed and Unsigned also obviously conflict with a lot of library traits, which are in the type namespace, so that seems like a non-starter.

@dhardy
Copy link
Contributor

dhardy commented Feb 20, 2026

Signed and Unsigned also obviously conflict with a lot of library traits

I specifically suggested these be in std::int or std::num, not in the prelude.

I personally think generic integers and/or integer traits would be very useful, but mainly for small sections of generic code where it is completely fine to use long names.

It's already mentioned under the alternatives

I saw mention of int and uint; personally I still dislike those names. Perhaps Int and Uint` would work. More awkward to type, but I don't see why that's a problem.

@tmccombs
Copy link

In the same way that std::num::NonZeroU32 and std::num::Wrapping are primitives?

No. Those are library types that are newtypes of u32 and i32 respectively. u32 doesn't have any definition in the library, and I think the same would be true of u<N>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I-lang-radar Items that are on lang's radar and will need eventual work or consideration. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-types Relevant to the types team, which will review and decide on the RFC.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments