Open
Conversation
There were several problems with the Base32 encoder and decoder: * It did not add padding, which is not RFC-compliant (unless referred to from another document allowing it). * The decoder did not correctly ignore padding and/or non-alphabet characters, returning erroneous data. Now, the first issue is only really a problem if you expose the values for decoding by other implementations. However, all instances where Base32-encoded values are used, use the encoded string verbatim and so is probably not an issue for Freemail. However, Base32-encoded values without padding are used in a lot of places, e.g. in the actual Freemail address, or the slot management, and so, an “encodeWithoutPadding” method has been provided to allow the creation of a non-padded value. The decoding issue might be more serious, as Freemail seems to be decoding Base32-encoded values as part of the slot management, but I haven’t delved into this deeply enough to be sure that this was not a problem. Fixing it seemed like a good, conservative approach. 😄 The Base32 class has been completely rewritten to be easier to read, and to conform to RFC 4648 as closely as possible. For example, the RFC states that Base32 is case-insensitive, but explicitely defines its alphabet as containing only the upper-case characters, not making an allowance for them to be lower-cased. As this is probably an oversight, this implementation can decode lower-case letters as well. The Base32 class has also been moved out of its original package, because it has nothing in common with the original implementation anymore and can simply live in our own packages now.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The header of
Base32.javastates:(Let’s ignore for a minute that RFC 3548 has been obsoleted by RFC 4648, but between those two nothing has substantially changed.)
It ignores section 2.2 of said RFC which states:
Section 5 is again very explicit:
As there is no “specification referring to this document” (like e.g. the MIME specification) (unless there is actually a Freemail specification?), the encoder implementation is not conforming to the RFC.
(Freemail uses Base32-encoded values in a lot of places, some of them exposed to users (like an account’s Freemail address), so a special “encodeWithoutPadding()” method was added to keep its behaviour intact.)
However, this is not why it’s broken. 😄
Encoding a single
0xffbyte should result in the string74======, but results in74. This is non-conforming to the RFC, as mentioned above, but, depending on the decoder and its leniency, may work. Decoding the string74======should result in a single byte being returned, but the implementation actually returns 5 bytes, because it doesn’t handle padding correctly, and that is broken.Then, section 2.3 also states:
While section 5 mentions that Base32 arose out of a need for a case-insensitive encoding, it does not specify that any further, but we may probably assume that the lower-case letter characters are valid encoding characters as well, and in this case I would argue that while accepting a wider range of inputs and ignoring non-alphabet characters is quite sensible (you know, the whole “be liberal in what you accept” thing), it technically violates the RFC. Anyway, our implementation is doing something wild: invalid characters will count towards the length of the returned array, so the more characters are being ignored, the larger the returned array—which is clearly incorrect!
Also, the Base32 class has been completely rewritten to be easier to read, and to conform to RFC 4648 as closely as possible. It has also been moved out of its original package, because it has nothing in common with the original implementation anymore and can simply live in our own packages now.