Google CTF: David and the Tree
We’re given a ZIP file
challenge.zip
containing the first 26 chapters of “Gadsby: A Story of Over 50,000 Words Without Using the Letter
“E”” by Ernest Vincent Wright.
The text of the book in the ZIP file is entirely in uppercase, but it is identical to the text on Project
Gutenberg if converted to uppercase. This led us to
believe that the challenge is about the ZIP file itself, not the contents. We examined the file using Kaitai
Struct’s WebIDE and tools like zipinfo
. After some time we realized that we got two
hints from the challenge’s name and description:
- “what’s more important is not what you have, but what you’re missing”: we’re looking for something missing, the ZIP’ed book omits the letter ‘E’.
- “David and the Tree”: the ZIP file is compressed using the DEFLATE algorithm which uses (David) Huffman trees to compress the data.
This led us to believe that the letter E
, together with its byte code representing its compressed version,
might be present in the Huffman tree for each of the deflated files. Each byte code for the compressed letter
E
could then encode a character of the flag. Note that normally there would be no reason to include the
letter E
in the Huffman tree as the uncompressed text does not contain the letter E
.
After some really ugly hacking, we indeed got the flag this way: CTF{!-!OLE-E-COM!7RESSION}
.
In case you’re interested in the quick and dirty way we solved this challenge: we wrote a simple script that
gets the byte code for the letter E
from the Huffman tree in two steps. First, we extract the DEFLATE bytes
from the ZIP file using Kaitai Struct’s ZIP parser. Next, we
parse the DEFLATE bytes using a deflatedecompress.py
script found on
Github to get
to its Huffman tree. We hacked our way into the deflatedecompress.py
script to access the Huffman tree
by adding a global variable hacky = litlencode
right after the line
litlencode = CanonicalCode(codelens[ : numlitlencodes])
.
While the byte codes associated with the letter E
are 9 bits long and do not directly encode ASCII
characters, we knew we were on the right track as the letter E
should normally not have appeared in the
Huffman tree. It turns out that we have to drop the most significant bit and reverse order of the remaining
bits to get to the character of the flag.
#!/usr/bin/env python3
import io
import deflatedecompress # modified from https://github.com/nayuki/Simple-DEFLATE-decompressor/blob/master/python/deflatedecompress.py
import zip # from https://formats.kaitai.io/zip/python.html
parsed_zip = zip.Zip.from_file('challenge.zip')
flag = []
for section_number, section in enumerate(parsed_zip.sections):
if isinstance(section.body, zip.Zip.LocalFile):
deflated_file = section.body.body
else:
continue
deflated_buf = deflatedecompress.BitInputStream(io.BytesIO(deflated_file))
inflated_buf = deflatedecompress.Decompressor.decompress_to_bytes(deflated_buf)
table = deflatedecompress.hacky._code_bits_to_symbol
for k, v in table.items():
if v == ord('E'):
k = k & 0xFF
# reverse bit order
rk = 0
for _ in range(8):
rk = rk << 1
if k & 1 == 1:
rk = rk | 1
k = k >> 1
flag.append(rk)
print(bytes(flag).decode())