Skip to content

Replace Broken Base32 Encoder/Decoder#54

Open
Bombe wants to merge 1 commit intohyphanet:masterfrom
Bombe:fix/base32-encoder
Open

Replace Broken Base32 Encoder/Decoder#54
Bombe wants to merge 1 commit intohyphanet:masterfrom
Bombe:fix/base32-encoder

Conversation

@Bombe
Copy link
Copy Markdown
Contributor

@Bombe Bombe commented Mar 30, 2026

The header of Base32.java states:

Base32 - encodes and decodes RFC3548 Base32

(Let’s ignore for a minute that RFC 3548 has been obsoleted by RFC 4648, but between those two nothing has substantially changed.)

It ignores section 2.2 of said RFC which states:

Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.

Section 5 is again very explicit:

A full encoding quantum is always completed at the end of a body. When fewer than 40 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 5-bit groups. Padding at the end of the data is performed using the "=" character.

As there is no “specification referring to this document” (like e.g. the MIME specification) (unless there is actually a Freemail specification?), the encoder implementation is not conforming to the RFC.

(Freemail uses Base32-encoded values in a lot of places, some of them exposed to users (like an account’s Freemail address), so a special “encodeWithoutPadding()” method was added to keep its behaviour intact.)

However, this is not why it’s broken. 😄

Encoding a single 0xff byte should result in the string 74======, but results in 74. This is non-conforming to the RFC, as mentioned above, but, depending on the decoder and its leniency, may work. Decoding the string 74====== should result in a single byte being returned, but the implementation actually returns 5 bytes, because it doesn’t handle padding correctly, and that is broken.

Then, section 2.3 also states:

Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise.

While section 5 mentions that Base32 arose out of a need for a case-insensitive encoding, it does not specify that any further, but we may probably assume that the lower-case letter characters are valid encoding characters as well, and in this case I would argue that while accepting a wider range of inputs and ignoring non-alphabet characters is quite sensible (you know, the whole “be liberal in what you accept” thing), it technically violates the RFC. Anyway, our implementation is doing something wild: invalid characters will count towards the length of the returned array, so the more characters are being ignored, the larger the returned array—which is clearly incorrect!

Also, the Base32 class has been completely rewritten to be easier to read, and to conform to RFC 4648 as closely as possible. It has also been moved out of its original package, because it has nothing in common with the original implementation anymore and can simply live in our own packages now.

There were several problems with the Base32 encoder and decoder:

* It did not add padding, which is not RFC-compliant (unless referred to from
  another document allowing it).
* The decoder did not correctly ignore padding and/or non-alphabet characters,
  returning erroneous data.

Now, the first issue is only really a problem if you expose the values for
decoding by other implementations. However, all instances where Base32-encoded
values are used, use the encoded string verbatim and so is probably not an
issue for Freemail. However, Base32-encoded values without padding are used in
a lot of places, e.g. in the actual Freemail address, or the slot management,
and so, an “encodeWithoutPadding” method has been provided to allow the
creation of a non-padded value.

The decoding issue might be more serious, as Freemail seems to be decoding
Base32-encoded values as part of the slot management, but I haven’t delved into
this deeply enough to be sure that this was not a problem. Fixing it seemed
like a good, conservative approach. 😄

The Base32 class has been completely rewritten to be easier to read, and to
conform to RFC 4648 as closely as possible. For example, the RFC states that
Base32 is case-insensitive, but explicitely defines its alphabet as containing
only the upper-case characters, not making an allowance for them to be
lower-cased. As this is probably an oversight, this implementation can decode
lower-case letters as well.

The Base32 class has also been moved out of its original package, because it
has nothing in common with the original implementation anymore and can simply
live in our own packages now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant