Thursday, November 14, 2013

Huffman Encoding - Pt 3

Quite a bit of progress:

After looking at the ASM again,  there appear to be multiple formats.

The first format is the type that starts with the offset,size values,
but the 'gotcha' is that there isn't any indication as to how many files
there are (most likely in the executable) besides running until one hits
the first offset of the first file.

The second format is our compression data - Prefixed with 0x08. There are actually
TWO execution paths here ; if the prefix is 0x0C or if it's 0x08:

0x0C appears to be our standard 64K sliding window (LH7) whereas
anything else (0x08 included) appears to be an 8k sliding window (LH5).

The next two bytes are the uncompressed size, but our compiler does something
odd; it takes the first byte, lshifts 8, then ORs the second byte onto it
like this:

0x08 0x20 0x40

flag = 0x08 (LH5)

uncompressed_size = 0x20 << 0x8 (0x2000)
uncompressed_size |= 0x40 (0x2000 | 0x40 = 0x2040)

Basically, it's reading the 16-bit size as big endian on a small endian system.

Our third and final type (so far) is rather odd - basically, it's a collection of
composited files (not like a directory) split up into compression chunks.
These files are generally given away by not starting with 0x08+size but
rather a strange int that varies (probably a checksum) and continues with
0x00 + 0x08 + UncSize

Generally, these files are in 8192 chunks; 8k sliding window, remember?
Extracting individual pieces gets you something like this:

Dear god - all the backgrounds are in 64x64 parts ><

No comments:

Post a Comment