Tetracorp: Intro to Amiga reverse-engineering with Ghidra

Ghidra is a powerful reverse-engineering tool created by the NSA and released to the public in 2019. It’s capable of handling 68000 code, which means it can be used to analyze Amiga software and games. However, it can be tricky to get the hang of at first. This introductory guide should give you a start.

See also Reverse-engineering Amiga games for Amiga-specific advice other than using Ghidra.

Installation
Start a new project
Basics of analysis
User interface
Controls
Advanced features
Custom datatypes
Other advice and methods

Installation

Ghidra requires Java. The version currently used is JDK 11. If you don’t have this installed, install it first.

To use Ghidra with Amiga, you will need ghidra_amiga_ldr, a plugin for loading the Amiga Hunk executable format. Download the latest release, but don’t unzip it. Note which version it is for (e.g. “GHIDRA 10.1.1 PUBLIC”). Recently there’s a fork of ghidra_amiga_ldr by BartmanAbyss which is newer, although I haven’t tested it yet.

Go to the Ghidra release page and download the version which matches the ghidra_amiga_ldr you downloaded. This isn’t always the latest version. Unzip it somewhere (it does not come with a proper installer, unfortunately). Run ghidra using either ghidraRun.bat (Windows) or ./ghidraRun.

From the main menu, select File - Install Extensions. Click the green plus in the top-right corner and select the ghidra_amiga_ldr zip file. You do not need to unzip that file. This should add “Amiga Executable hunks loader” to the list. If it’s ticked, it’s enabled.

Start a new project

Create a new project and pick somewhere to store it. From the main menu select File - Import File and import the Amiga executable you want to analyze. Double-click that file to open the CodeBrowser window for that file. This is the main program window you will be using.

Upon importing the file, you will first be asked if you want to auto-analyze the file. Select “Yes”, but be careful with your options:

Amiga Library Calls: Tick whichever libraries the program uses. A good hint is to look in the libraries directory on the program disk. Another way to check is to open the executable in a hex editor and search for “library” or “device” for strings hardcoded into the program. The Exec library is also commonly built-in.
Call-Fixup Installer: Untick this! This feature looks for C functions which don’t return values. However, if your program was written in Assembly (most commercial Amiga games were), it doesn’t necessarily return values in the normal style, so this feature would just confuse the program flow (i.e. marking chunks of code after a JSR or BSR as non-reachable and refusing to disassemble them).
Data Reference: Untick “Unicode String References”. If you’re analyzing an Amiga program it’s probably pre-Unicode.
Non-Returning Functions: Untick this too, for the same reason as Call-Fixup
Reference: Untick “Unicode String References”.

If you chose not to run auto-analysis, don’t worry, as you can re-run it at any time from the CodeBrowser window menu Analysis - Auto Analyze ‘programname’… option. You can also individually run features in Analysis - One Shot, such as Aggressive Instruction Finder.

Save (Ctrl-S).

Basics of analysis

Disassembly will convert an program executable back into assembly language. It was standard practice for Amiga game developers to strip the list of variable and function names (called the symbol table) from the finished program. Making sense of such a disassembly usually involves a lot of manually following the code to give names back to things so it’s clearer what they mean.

In Ghidra, if you rename a variable or change its type, it’s automatically updated everywhere in the program. You can set a variable name, then click the references to see what code reads or writes it, and that will in turn give you a clue what to name that function, and so on. It also helps to add comments.

See also Reverse-engineering Amiga games.

User interface

The main window is the CodeBrowser. The main area you will work with is the Listing panel.

The number in black to the left is a theoretic address the program is located at in Amiga “RAM” - not the address in the executable itself, but usually offset by hexadecimal 0x0021f000 (you can change this in the options when initially importing the executable). You may find one of these alternative offsets helpful in some specific cases:

Set the offset to a large round number like 00100000, so that the results are always a round number off the disassembly values generated by another disassember like IRA, if you were previously working with that (however, you can also do the reverse, and configure IRA to dissassemble offset by 21f000). This probably isn’t necessary if you’re only using Ghidra.
Configure an emulator (FS-UAE or WinUAE) to create uncompressed save states, and to use the minimum RAM necessary to avoid making enormous save states. By searching for some known string set of bytes, you can determine which offset the game is located at in you save state. You can then compare the disassembly to the game running in memory, such as to see what values are in a given variable. To keep the offset constant, always playtest the game from the save state.

The blue number to the right of the address is the raw bytes of the instruction or data located at this memory address. To the right of that is the instruction mnemonic, also called the assembly language instruction. To the right of that are the parameters used that instruction. For example:

020078c0 0c 80 00        cmpi.l     #0x989680,D0
         98 96 80

020078c0 is the memory address in hexadecimal.
0c 80 00 98 96 80 are the raw bytes which represent this instruction.
cmpi.l #0x989680,D0 is the assembly language which would generate that that bytecode.
- CMPI is the instruction to compare a number to the register and set flags accordingly.
- .L means a “long”, or four-byte. In 68k, you commonly see .B (byte), .W (word, two bytes) or .L (long, four bytes).
- # signifies that the following value is a literal number.
- 0x means the following number is written in hexadecimal, or base 16 format, which is useful as it allows one byte to be stored in two characters (0 to 256 = 00 to FF). Some disassemblers will instead use the prefix $ to denote hexadecimal.
- 989680 is the number, equal to 1 million in decimal.
- D0 is the destination register D0.

At this point, it becomes very useful to learn 68k Assembly language, if you aren’t already familiar with it. Learning 68k is too big a topic to cover in this article, but there are numerous tutorials and guides. I recommend MarkyJester’s 68k Tutorial. You can use a program like Easy68k (Windows) to play around with 68k to get an intuitive sense of how 68k instructions work.

After you make any change in the Listing panel, a black marker is placed in the scrollbar to denote an unsaved change. The markers go away when you save. This is useful when you run a script or something which makes many changes, so you can click on each marker to manually check each change before committing the save.

To the right of the Listing panel is the Decompile panel, which will attempt to translate the current section into C. This is a more high-level language, so it may be easier for you to interpret, especially if you’re unfamiliar with Assembly. However, since most Amiga games were written in Assembly and not C, it will not always interpet meaningfully or correctly. You therefore can’t rely on it all the time.

Controls

Ghidra has a huge number of controls and menus, but you will find the following basic controls useful.

Double click on a function or variable name to go there in the code. Hit the back button at the left of the top toolbar to return to where you came from (Alt-Shift-Left will also do this).
Right-click on a piece of data or undefined code, select Data and then choose a data type. Useful types include pointer (4-byte pointer), string (ASCII text)

Some keyboard controls are particularly useful:

D: Disassemble from the current line, or currently selected lines. Often, Ghidra will fail to correctly identify instructions as such, e.g. if it calculates that the code will not be reached from any other part of the program. This forces Ghidra to recognize it as code and disassemble it.
C: Clear. The opposite of disassemble, essentially. This turns a recognized code or data segment into unrecognized bytes. Useful if something has been misidentified as code, or its type has been misinterpreted (e.g. random bytes misinterpreted as part of a string).
L: Label. Rename the current function or variable. Once you worked out what it does, give it a name that explains its function. It is automatically renamed throughout the code, which gives you another clue when it appears elsewhere in the code. If you only have a partial understanding of the function, you can still name it to explain what you know so far.
;: Hit semicolon to add a comment to a line. You can place comments on the line before, the line after, at the end of the line, etc. Ctrl-Enter submits the comment. Commenting the code is really useful for reminding yourself what something does.
T: Type. Set data type on some data (i.e. not an instruction). You can also right-click and select the Data menu. Common Amiga types include byte, word (two bytes), long (four bytes), pointer (four bytes referencing a memory location), char (one byte ascii character), and string (unlimited length series of ascii bytes). You can also enter your own custom structs or enums created in the Data Type Manager, or arrays (e.g. char[10]), or pointers to specific types (e.g. for a pointer to a long, instead of just pointer, you can write long*).
Y: Set data type to the last selected data type. Useful if you need to set the type on a lot of strings, say. Right-click on the first string and select Data - string. For each other string, simply click on them and hit Y.
[: Create an array from a selected area. Set the type first, so that you create an array of that type.
E: Equate. Set the type of a parameter in an instruction. Useful for setting your own custom enum entries; e.g. instead of tst.b (0x4,A0) you can equate the 0x4 to your ShipID type and have it read tst.b (scoutship,A0). For standard types (e.g. unsigned decimal), you can also right-click and select Convert.
F: Define the current line as the start of a Function. If used on an existing Function, it opens a useful function definition editor. If the decompile window is greyed out, you may want to define a function.
B: Set the type of a variable to byte. Hit it again to set it to word (two bytes).
P: Set type to pointer.
Ctrl-Down / Ctrl-Up: Go to the next and previous function in the code.

Various other shortcuts are defined in the Ghidra Cheat Sheet. You can also define your own shortcuts in Edit - Tool Options - Key Bindings. For example, you might bind D to equate unsigned decimal. Even though D is already Disassemble, you can still bind it and Ghidra will work out from context which one you meant.

Advanced features

Search - For address tables is a useful feature to find tables of pointers. Often, software will have jump tables, which are a list of pointers to functions.

Right click on a number and select “convert” to display it in a different format, such as decimal rather than hexadecimal.

Ghidra currently appears unable to import the symbol table (the list of variable and function names ) from an Amiga executable. This isn’t a huge deal, since most Amiga games did not come with a symbol table intact. However, you can import symbol names from a properly-formatted list (e.g. if you performed a previous disassembly using other software). From the menu in the CodeBrowser select Window - Script Manager, then pick Data and double-click on ImportSymbolsScript.py.

Tools - Program Differences will compare two programs, possibly letting you compare two different versions of the same program. I have yet to test this feature.

In the Script Manager, the script LabelDataScript.java will apply meaningful labels based on data type. I haven’t tested it, but it may be useful.

You can hit F on a function to edit its signature; i.e. to define what its input and output values are. Ghidra assumes functions take their input values from the stack, but I’ve noticed functions which take their input values from registers (i.e. A0-A7 and D0-D7). You need to click “Use Custom Storage”, then click the plus to add new lines referencing data which this function takes. The main use of this is the C decompilation code makes more sense.

Custom datatypes

Another useful thing is to establish an enum, which is essentially a custom Data Type which converts numbers into names. Suppose your game has a variable for currently equipped weapon, something like 0=Unarmed, 1=Dagger, 2=Sword, 3=Axe. Right-click your program in the Data Type Manager panel in the bottom-left of the CodeBrowser window and create a new enum. Click the green plus icon to add each row with its name and automatically incrementing value. Give your enum a name and click the blue disk icon to save. You now have effectively a custom data type.

The quickest way I’ve found to add a number of enums is to click the green plus icon several times to create the desired number of entries, then go through the list and hit F2 on each field to name it. This saves you switching between mouse and keyboard between each entry. However, for especially large enums, you might create a C typedef file and import it with File - Parse C Source. For example, create and import a file named weapon.h like below:

typedef enum Weapon {
  Unarmed=0,
  Dagger=1,
  Sword=2,
  Axe=3
} Weapon ;

You can similarly define composite data types called structs. Suppose each ship in a game is stored in 24 bytes, with the first byte always storing the ship’s Armour value, the second byte its Speed, and so on. You can define a struct to represent that pattern. Like with enum, right-click the program name in the Data Type Manager and select New - Structure. In the Structure Editor, enter the Size of your struct in bytes and a name, and press F2 on the DataType and Name fields to edit the type and name of known byte offsets. You can, of course, use your custom enums and arrays as types. If you need to erase a row, press C or click the erase icon; don’t hit Delete, as it will remove the row and change the number of all subsequent offsets, which you probably don’t want. Click the blue disk icon to save.

Amiga software often used a data structure known as a bitfield, which stored a flag in each bit of a byte. The struct editor can also add bitfields, although the UI is a little clunky. Right-click on an empty row and click Add Bitfield. Set the Base Datatype to byte and the Allocation Bytes to 1. Enter your field name, select the Bit Offset for this bit, and click OK. To add another bit, right-click the same row, click Add Bitfield, and enter the field name and offset for that new bit. The quickest way I’ve found to add eight bits is to bind a key (e.g. T) to Add Bitfield, set it to type Byte and 1 byte, hit OK, then 7 more times hit T, click the next bit in the Component Bits, and click OK. You can then go back and edit the names of those bits by pressing F2 as normal. The bits are ordered from the largest first; i.e. bit 7 will be at the top and bit 0 at the bottom.

Where enums and structs come in handy is that you can use them anywhere you’d normally select a type, such as variable types or function parameters. Press T to set the type of a variable and just type the name of the enum or struct you defined. Press F on a function edit its signature and add your custom type as one of its parameters (you will need to select Use Custom Storage). Now, the Decompile panel will show your type automatically. Press E to equate an instruction parameter to an enum. You can also name pointers this way; e.g. if a variable always contains a pointer to a Ship, just hit T and type Ship*.

You can also set an array as a type, or even a two-dimensional array (i.e. a table of data). For example, the game K240 stores a table of up to four blueprints given for free when fighting each of six aliens, with each blueprint represented by a one-byte number. After naming all the blueprints in an enum, I simply hit T on the starting blueprint table variable and set it to Blueprint[6][4]. If you don’t need an enum in this case, you could use byte[6][4].

If you accidentally give two things the same name, Ghidra will continue to track them correctly and won’t confuse the two.

Other advice and methods

You will sometimes see references to fixed memory addresses like DFF006:

0201a6c8 31 79 02        move.w     (data315E8).l,(offset DAT_00dff0d6,A0)
         03 15 e8 
         00 d6
0201a6d0 31 79 02        move.w     (data315EA).l,(offset DAT_00dff0d8,A0)
         03 15 ea 
         00 d8

These are Amiga hardware addresses. You can import them using the ImportSymbolsScript.py from this file. These hardware addresses are often a really good way to identify which parts of code are dealing with key system functions like graphics, sound, or input. For more detail on their meaning, see Mapping the Amiga or m68k-instructions-documentation.

Particular to note is DFF006, VHPOSR, “Vertical/Horizontal Beam Position Read”, which is often used to seed the random number generator. Random numbers tend to be used in RPGs and strategy games for various game mechanics, so this can help you zero in on functions like weapon damage. Also notable are $DFF00A, JOY0DAT and $DFF00C, JOY1DAT, for joystick and mouse input.

I have also added ImportCommentsScript.py, if you need to import comments from another source.

Auto-analysis might create functions from elements that are really just subsections of an existing function. Click on the function name and hit Delete to turn it from a function into a label. Generally speaking, if it is arrived at by a JSR or BSR instruction, it’s equivalent to a C function; if it’s arrived at by something else like BEQ, it’s more equivalent to an “if” statement within a function. Functions generally end in an RTS. You can hit F to make the label back into a function.

However, Amiga games written in 68k assembly don’t always adhere to what would nowadays be considered “correct” definitions of code structure.

Disassembling games written in AMOS Basic is a pain. It generates complicated and idiosyncratic code that isn’t like what a programmer would normally write, so it’s hard to understand. You can use amostools to extract Basic source code from a .AMOS file, but a lot of AMOS games were compiled for efficiency.

Select a piece of code in the Listing panel to highlight the equivalent code in the Decompile panel. This is useful if you don’t know 68k Assembly language well, but find it easier to recognize a C-inspired high-level language such as Java or PHP.

« Back to index page