Thursday, August 7, 2014

Hacking Clocktower - The First Fear

This one... this one should be fun - let's take a great Japanese survival-horror game, figure out how it works, make it English, and do some other stuff to it :)

If you've never heard of ClockTower (or Clocktower - The First Fear as it's known in Japan), it was a game originally for the Super Nintendo by Human Entertainment.

It's one of those point and click adventure games (if you're old like me and remember those), but it's also one of the earlier games that would help create the survival horror genre. Basically, these 4 orphans end up at this mansion after the orphanage closes and they start disappearing; you take over the role of Jennifer in the group and have to find a way out, hopefully find your friends, and figure out what's going on in the mansion.

The atmosphere is gritty, things happen in the house for no apparent reason at random (it's pretty haunted; let's just say), sometimes things just come out and kill you, and all the while, this psychopath with a pair of gardening shears is chasing you around the house - stalking you (the nemesis thing from Resident Evil 3 was not some cute idea Capcom came up with):

The game is filled with gritty situations that involve thinking quickly on your feet to survive, running and hiding is the only option, but saving what sanity Jennifer has left is also important as she can trip over her own feet and die of a heart attack if things get too intense (a la Illbleed).

Anyway, this game was released in 1994 for the SNES, re-released with more content on the PSX in 1997, released on the Wonderswan (but nobody gives a damn about the Wonderswan), and finally released on the PC (and then later re-released on the PC in 1999 as a value edition).

Yes - there's an English translation of the SNES rom by Aeon Genesis (I think it was him, anyway), but the SNES version is kinda shitty compared to the PC version which has much better sounds, an FMV intro, etc. Probably the only version with a bit more content is the PSX version, but we'll get into that later.

First, let's focus on getting the game running without the CD!

Part 1 - The Landscape

Basically, we have a game directory that looks like this:

We have a BG directory containing what looks to be the room backgrounds in a carved up format along with an accompanying .MAP file for each one which I can only assume maps the various slices to the screen:

We have a DATA directory containing various PTN and PYX files... not sure what these are at the moment... probably mapping the level screens to logic or something.

We have a FACE directory containing bmps of the faces when people speak:

We have an item directory containing all the items Jennifer can use:

We have a PATTERN directory with ABM files. I believe these are used for sprite animations, but I could be wrong.

We have a SCE directory with two files: CT.ADO and CT.ADT... we'll get to these - basically, it's a logical script for the actions that happen in the entire game using a custom binary scripting language... this is going to be where we spend most of our time.

We have a SNDDATA directory with MIDI and WAV files for BGM and SFX

We have a VISUAL directory with all the HQ renders shown, menu options, etc in BMP format. This is also where the intro video is held.

Finally, we have a DATA.BIN file which... I have no idea what it does at the moment in addition to the ct32.exe which is our game executable.

Running the game without the CD present will give us this:

Running it in a debugger to find this string, you'll find that it's basically saying it doesn't know where the game data is (ie CD is not inserted).

Part 2 - Let's NOCD this bitch
Looking in IDA, we already see RegQueryValueEx used - basically pulling from a registry key... I wonder what it's getting?

Well then... I guess that's the answer.

Let's check first to see what calls this... maybe we can hardcode it:

It seems to be a bit more digging has paid off! The game looks for an arg called NOREG. If it doesn't find it, it checks the registry value, it skips the check and starts the game assuming it's in the parent directory. A simple patch to make the game think it's always being executed with NOREG as an arg by changing the jz to 0xEB (or unconditional JMP) should do the trick here.

Running it now produces:


Part 3 - Finding the in-game text 
Stepping back from a logical standpoint, should we expect all the text to be in the executable? Well, it's possible - many games do it. There are even some strings in the binaries (like the NOCD error we found) that are static, but I have a feeling they're going to be elsewhere (maybe in the DATA files?) - let's see:

Japanese text that uses the SHIFT-JIS format (extremely popular) normally starts with a control character of 0x80-0x8F... simply looking for stuff like this in the executable is a good place to start.

We then copy these strings out  to a new file and open in notepad++ - pasting it into Google Translate:
Well, these look like some errors, but no real game dialogue... time to look in other files!!!

Hrmm - well the PTN and PYX files don't really have anything text-related... let's check the SCE folder :)

Well, the CT.ADT file looks to have 4 byte offsets (they count up from 0x100 all the way to the end of the file, 4 bytes at a time).

The CT.ADO file, however...

Now we're getting somewhere... not only are there ASCII strings that look like file paths, but also SHIFT-JIS text strings. The data in this file is kind of odd... let's see if we can understand it, better. We're gonna have to if we need to parse it.

 Part 4 - Diving into ADO/ADT
We've already looked a bit into how ADT works, let's take a look at what ADO looks like:

Basically, we have a magic (ADC Object File) across the top, then a bunch of 0xFFs to get the offset up to 256, then it looks like data begins. Time to think like a Computer Scientist!!!

We have all these strings, a binary format with no apparent lookup besides potentially that ADT file, there has to be some kind of control code in here mixed in with the data to let the game know what it's looking at as it's being parsed... everything has a structure, somehow.

Looking at the strings with .BMP, we notice that they follow a similar pattern:

39FF followed by a null 2-byte value (most of the time, sometimes it's 0x0100 which makes me think this is a WORD value), then a null-terminated ASCII string and some padding... In fact, EVERY time a BMP is loaded for FACE, 0xFF39 as a value is at the front! 

Let's check the executable:

Nice! Not only do we have this valule, but we have other values as well - let's check a few Shift-JIS strings to see if we can find a pattern:

Awesome! All strings start with 0xFF33, have two 16-bit values (0x0F00), and then a Shift-JIS string.

Note: One thing you'll notice about SHIFT-JIS is that it is NOT null terminated; it can't be. You see, certain programs can operate on a mix of both single and multi-byte character values, but older ones had a more difficult time with this. As a result, you'll notice that all linebreaks (0x0a) have an 0x00 after them. In fact, ANY ASCII character in this has an 0x00 after it (and numbers or english letters). This is a way that the text rendering can support multibyte interpretation with ASCII characters (0-127) and not accidentally fuck up and read a data byte as a control byte and vice-versa.

As a result, we can concur that some logic in the game must parse the string until it finds the end somehow (probably by looking for a new opcode (normally 0x28FF).

So now we know that an ADO file is basically a shit-ton of scripting 'opcodes' and subsequent data. We can theorize that the game reads this and knows what data to expect based on the preceding opcode. Now, we can look at the switch statement in the executable that we found earlier (with all the cases) to mark down every opcode the game supports (to better understand the format)

We do this by noting the values (0xFF20, 0xFF87) and looking at them in the ADO file, determining if they have the same number of data bytes before the next opcode, attempt to figure out if they're 2 byte values, strings, etc, and so on.

In addition, you'll notice that the executable has some interesting text:

In fact, this is a list and looks suspiciously like opcode names - lucky for us, they ARE! We now know what the opcode names are :)
From this point, we can run the game with a debugger, breaking at the various switch statements to see operations of various opcodes - one that might be interesting to us is JMP...

In fact, the first JMP (0xFF22) has a 2 byte value of 0x17 after it.

If you watch this in the game, and set the ADO_offset as a watched variable in IDA, you'll see the game jump from this value to 0x1B32 - How did it know to do that? It's not a multiple... maybe :


0x17 * 4 is 0x5C - the ADT is a jump table for various scenes... you'll notice that the CALL function (0xFF23) works in a similar way, but returns to this offset after a while... The first few ADT offsets all point to 0xFF00... it seems to be pretty prominent in the game... the jumps actually skip over them though (they add +2 to the offset after jumping)... are we seeing some kind of RETN opcode???? I think so ;)

You'll notice, however, that the ADT file has various values at the end that are far beyond the size of the ADO file... what gives?! We'll get into that, but it will take a bit more introspection into the game running to determine how these jumps work (specifically watching the jumps).
Dumping the memory, one thing you'll notice is that the ADC object file in memory (CT.ADO) has a value of 0x8000 int16 written in every 0x8000 or 32kb. Besides that, the ADO is unchanged. You can also see in the executable that it a function parses the values and skips 2 bytes ahead if it sees this value (akin to a NOP).

As this game splits the data into 32k chunks (most likely to reference memory in a more segmented fashion (we're dealing with 2 byte values a ton - this is important), there has to be some sort of "address translation" for the ADT (as the ADT uses 4 bytes to reference an offset).

So there's a bunch of math here that, if you watched the debugger, you'd see something akin to this:

I didn't find the function originally, I actually went to the end of the ADT file (assuming it was a pointer to the end of the ADO file (the last RETN being offset 0x253F4). The ADT had this listed as 0x453F4... after looking at various others, I noticed that the translation took the two significant bytes, halved them, and stuck them back onto the end :)

Ok, so far, we can generate ADT files (generating is the opposite; multiplying the most significant depending how many intervals of 0x8000 we've been over), we also have a general breakdown of the opcodes, and we know where the strings are. Before we dive into disassembly; let's keep our eye on the prize and go for broke (translation first).

Part 4: Enter CTTDTI Suite
We know what format the strings are in and how to read them out of the ADO file... of course, injecting them back in will involve modifying the ADT offsets as string sizes are going to be much larger or smaller depending. Firstly, let's focus on ripping the ADO strings out into a text file ... something easily editable and that can be read back into another program to mess with them, even still, let's make a format where we can easily inject the strings into a new ADO file and update offsets easily... what do we need???

Well, the offset where the string starts is important, so is the size of the original string in bytes, then the string itself... sort of like

0xE92 25 blahblahblah

Basically, I renamed CT.ADO to CT_J.ADO for when I generate a new one.
This program reads in the ADO file, finds 0xFF33, skips 6 bytes ahead (to skip the opcode and 2-2byte values), and writes the starting offset of the string, the length of the string, and the string itself in a tab delimited format ending with a newline to a text file - simple :)

You'll notice that I replace any 0x0a value (newline) with [NEWLINE] - this is because I want the whole string to be processed on one line and be able to specify newlines where I want them without having to modify the format of the text file.

For fun, let's do something kinda silly - we're gonna parse this text file with translator; a python module that dumps data out to google translate and autodetects language, translates, and returns it in your desired language:

Let's try a couple of strings with an injector now - the last program in this suite parses the text file,adds null-padding to any ASCII character in the strings, and reads the lines into a dict so we know what offsets are affected. It also rebuilds an ADO from scratch (it reads the ADT, loads all the "scenes" into an array with their offsets, copies all the data between strings and afterward), and then regenerates an ADT based upon the sizes of the ADO "scenes" while constructing:

And we test to see if it works:

The Engrish in this is gonna be terrible though - thankfully, I found an rtf on some Clocktower Fan Forum ( ;) ) and can manually edit the strings based upon the rough translations.

It's all translated and ready to go! We run it:


Ok - something's wrong; let's throw it into IDA and see what's up:

So it looks like it's trying to read the ADO file into memory, but it tries to resolve a pointer and can't because nothing exists at that spot!

Seeing the struct a1 and all the malloc'ing it does, this must be the issue - digging back further, you'll find that these pointers are made here:

So the game (based on the cmp5 opcode) will only make pointers for 5 * 0x8000 chunks of the ADO but will read the ADO data in until EOF (definitely a bug). As a result, we can only load an ADO file of max size 0x28000. Does that stop us!? FUCK NO! Let's dig into this SCE struct in memory a bit more...

We can change all instances of loading the ADO pointers in from 5 to 6 in order to add another pointer, but what's after that last pointer? Why, the start of the ADT offsets, of course :)

We see that ADT offset 0x00 is at struct_head + 0x2A and goes for 0x7D0... seriously? 0x7D0 pointers??? that's like 0x8000 wait a minute:D:D:D:D:D

As a result of only having 0x4800 bytes used for our ADT file, we can say that bumping the ADT start index down to , say, 0x2E would give us 4 more bytes to write another ADO pointer there and we'd still have a ton of reserved room to spare at the end!

Finding struct references to 0x2A and changing them to 0x2E as well and:

#AWWWWWYEAAAAAAA - Gotta love Object Oriented Reverse-Engineering :)

Ok - so now we have the game completely translated, now what?!

In fact - here are the ADO/ADT files I created (drop them in your SCE directory to play clocktower in English :)))
You'd also need to make binary changes with a hex editor to CT32.exe:
Future Work - Part 5 - SCEDASM - the SCE disassembler
The next logical step would be to disassemble all the other opcodes as well to build a text file that could eventually be read into a game/editor :)

Sort of like, well - this: (WIP and extremely hacky... an exercise for the reader):

Then of course, we'd need to push this all back into an ADO/ADT pair:

Future Work - Part 6 - The PSX Version
The PSX version of the game uses ADO/ADT as well!!! We could convert the assets and add the PSX exclusive content to the PC version, it would seem.

Until Next Time :)