Replace Broken Base32 Encoder/Decoder by Bombe · Pull Request #54 · hyphanet/plugin-Freemail

Bombe · 2026-03-30T06:18:18Z

The header of Base32.java states:

Base32 - encodes and decodes RFC3548 Base32

(Let’s ignore for a minute that RFC 3548 has been obsoleted by RFC 4648, but between those two nothing has substantially changed.)

It ignores section 2.2 of said RFC which states:

Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.

Section 5 is again very explicit:

A full encoding quantum is always completed at the end of a body. When fewer than 40 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 5-bit groups. Padding at the end of the data is performed using the "=" character.

As there is no “specification referring to this document” (like e.g. the MIME specification) (unless there is actually a Freemail specification?), the encoder implementation is not conforming to the RFC.

(Freemail uses Base32-encoded values in a lot of places, some of them exposed to users (like an account’s Freemail address), so a special “encodeWithoutPadding()” method was added to keep its behaviour intact.)

However, this is not why it’s broken. 😄

Encoding a single 0xff byte should result in the string 74======, but results in 74. This is non-conforming to the RFC, as mentioned above, but, depending on the decoder and its leniency, may work. Decoding the string 74====== should result in a single byte being returned, but the implementation actually returns 5 bytes, because it doesn’t handle padding correctly, and that is broken.

Then, section 2.3 also states:

Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise.

While section 5 mentions that Base32 arose out of a need for a case-insensitive encoding, it does not specify that any further, but we may probably assume that the lower-case letter characters are valid encoding characters as well, and in this case I would argue that while accepting a wider range of inputs and ignoring non-alphabet characters is quite sensible (you know, the whole “be liberal in what you accept” thing), it technically violates the RFC. Anyway, our implementation is doing something wild: invalid characters will count towards the length of the returned array, so the more characters are being ignored, the larger the returned array—which is clearly incorrect!

Also, the Base32 class has been completely rewritten to be easier to read, and to conform to RFC 4648 as closely as possible. It has also been moved out of its original package, because it has nothing in common with the original implementation anymore and can simply live in our own packages now.

There were several problems with the Base32 encoder and decoder: * It did not add padding, which is not RFC-compliant (unless referred to from another document allowing it). * The decoder did not correctly ignore padding and/or non-alphabet characters, returning erroneous data. Now, the first issue is only really a problem if you expose the values for decoding by other implementations. However, all instances where Base32-encoded values are used, use the encoded string verbatim and so is probably not an issue for Freemail. However, Base32-encoded values without padding are used in a lot of places, e.g. in the actual Freemail address, or the slot management, and so, an “encodeWithoutPadding” method has been provided to allow the creation of a non-padded value. The decoding issue might be more serious, as Freemail seems to be decoding Base32-encoded values as part of the slot management, but I haven’t delved into this deeply enough to be sure that this was not a problem. Fixing it seemed like a good, conservative approach. 😄 The Base32 class has been completely rewritten to be easier to read, and to conform to RFC 4648 as closely as possible. For example, the RFC states that Base32 is case-insensitive, but explicitely defines its alphabet as containing only the upper-case characters, not making an allowance for them to be lower-cased. As this is probably an oversight, this implementation can decode lower-case letters as well. The Base32 class has also been moved out of its original package, because it has nothing in common with the original implementation anymore and can simply live in our own packages now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Broken Base32 Encoder/Decoder#54

Replace Broken Base32 Encoder/Decoder#54
Bombe wants to merge 1 commit intohyphanet:masterfrom
Bombe:fix/base32-encoder

Bombe commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bombe commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant