Skip to content

Commit 467ace0

Browse files
committed
content: update base64 post for clarity, and include janky solve script
1 parent 40301fa commit 467ace0

1 file changed

Lines changed: 23 additions & 8 deletions

File tree

content/posts/ctf/hkcert22/2022-11-14-hkcert-2022-base64-encryption.md

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@ We’re provided with an encryption script `chall.py` (written in Python), along
4343

4444
So how do we go about cracking this? Brute-force will be undoubtedly inefficient as we have $64! \approx 1.27 \times 10^{89}$ mapping combinations to try. It would take *years* before we have any progress! Also we’d need to look at results to determine if the English looks right (or automate it by checking a word list)—this would take even more time! Regardless, we need to find some other way.
4545

46-
## Let’s Get Cracking
46+
## First Steps: Elimination by ASCII Range
4747

48-
Here’s one idea: since the plaintext is an English article, this means that most (if not all) characters are in the printable ASCII range (32-127). This means that the most significant bit (MSB) of each byte *cannot* be 1. We can use this to create a **blacklist** of mappings. For example, originally we have 64 mappings for the letter `A`. After blacklisting, we may be left with, say, 16 mappings. This drastically reduces the search space.[^extended-ascii]
48+
Here’s one idea: since the plaintext is an English article, this means that most (if not all) characters are in the printable ASCII range (32-127). This means that the most significant bit (MSB) of each byte *cannot* be 1. We can use this to create a **blacklist** of mappings. For example, originally we have 64 possible mappings for the letter `A`. After blacklisting, we may be left with, say, 16 possible mappings. This drastically reduces the search space.[^extended-ascii]
4949

5050
Since Base64 simply maps 8-bits to 6-bits, so 3 characters of ASCII would be translated to 4 characters of Base64.
5151

@@ -62,23 +62,31 @@ def get_chars_with_mask(m):
6262
"""Get Base64 chars which are masked with m."""
6363
return {c for i, c in enumerate(charset) if (i & m) == m}
6464

65+
# List the 4 Base64 positions. We'll cycle through these positions (i.e. i % 4).
6566
msbs = [0b100000, 0b001000, 0b000010, 0b000000]
67+
68+
# Get impossible characters for each position.
6669
subchars = [get_chars_with_mask(m) for m in msbs]
6770

71+
# Create a blacklist for each Base64 char.
72+
# e.g. blacklist['A'] returns the set of chars which 'A' can NOT map to.
6873
blacklist = {c: set() for c in charset}
6974

75+
# Loop through each char in the shuffled Base64 text.
7076
for i, c in enumerate(txt):
71-
# Ignore char mappings which have 1 in corresponding msb.
77+
# Ignore char mappings which have '1' in corresponding msb.
7278
# These can't map to a printable ASCII char.
7379
blacklist[c] |= subchars[i % 4]
7480

81+
# Invert the blacklist to get a dictionary of possible mappings.
82+
# e.g. whitelist['A'] returns the set of chars which 'A' CAN map to.
7583
whitelist = {k: set(charset) - v for k, v in blacklist.items()}
7684
```
7785

7886
We can check the mappings we’ve eliminated:
7987

8088
```python
81-
print(''.join(sorted(blacklist['J']))
89+
print(''.join(sorted(blacklist['J'])))
8290
# '+/0123456789CDGHKLOPSTWXabefghijklmnopqrstuvwxyz'
8391
```
8492

@@ -97,9 +105,12 @@ We can do a similar thing on the low end. Again, since the smallest printable AS
97105
def get_inverted_chars_with_mask(m):
98106
return {c for i, c in enumerate(charset) if ((2**6 - 1 - i) & m) == m}
99107

100-
subchars_not_in_ascii = [get_inverted_chars_with_mask(m) for m in in_ascii] # chars that don't have bits set in ascii.
108+
# chars that don't have bits set in ascii.
109+
subchars_not_in_ascii = [get_inverted_chars_with_mask(m) for m in in_ascii]
101110
```
102111

112+
## Frequency Analysis with Known Text
113+
103114
Another idea comes to mind. Remember the plaintext is in English? Well, with English text, some letters appear more frequently than others. The same applies to words and sequences.
104115

105116
{% image "assets/base64-letter-frequencies.jpg", "w-65", "Frequency of English letters. But we need to be careful with letter cases." %}
@@ -120,7 +131,7 @@ V2UncmUgbm8gc3RyYW5nZXJzIHRvIGxvdmUKWW91IGtub3cgdGhlIHJ1bGVzIGFuZCBzbyBkbyBJIChk
120131
{% image "assets/b64-crypt-1gram.jpg", "", "dcode.fr frequency analysis for encrypted Base64." %}
121132
{% endimages %}
122133

123-
<sup>Frequency analysis of plain vs. encrypted Base64.</sup>
134+
<sup>Frequency analysis of plain vs. encrypted Base64. Left: CNN Lite articles. Right: Encrypted challenge text.</sup>
124135
{.caption}
125136

126137
From this, we can deduce that 'w' was mapped from 'G' in the original encoding (due to the gap in frequency).
@@ -132,18 +143,22 @@ One useful option is the **bigrams/n-grams** option. We can tell dcode to analys
132143
{% image "assets/b64-crypt-4gram.jpg", "", "dcode.fr 4-gram for encrypted Base64." %}
133144
{% endimages %}
134145

135-
<sup>Frequency analysis of 4-grams in plain vs. encrypted Base64.</sup>
146+
<sup>Frequency analysis of 4-grams in plain vs. encrypted Base64. Left: CNN Lite articles. Right: Encrypted challenge text.</sup>
136147
{.caption}
137148

138149
Observe how "YoJP0H" occurs (relatively) frequently. This corresponds to "IHRoZS", which happens to be the Base64 encoding for " the".
139150

151+
## More Heuristics
152+
140153
Frequency analysis is useful to group letters into buckets. But using frequency analysis alone is painful. Some guesswork is needed. Here's the complete process I went through:
141154

142155
- Frequency Analysis: use dcode.fr to associate frequent characters.
143156
- We can make use of our earlier constraints to eliminate wrong guesses.[^byebye-constraints]
144157

145158
```python
146-
guesses = { # Dictionary of guessed mappings.
159+
# Dictionary of guessed mappings.
160+
# key: shuffled Base64; value: plain Base64
161+
guesses = {
147162
'w': 'G', 'Y': 'I',
148163
'o': 'H', 'c': 'B',
149164

0 commit comments

Comments
 (0)