Lecture Notes
Analyzing Binary Executable files. Focus on static analysis, calling conventions and decompilation.
Download here
Practical Tasks
NoReturn Functions
The noReturn.zip is an example program from the Ghidra documentation, demonstrating the impact of of functions without return.
The function loopForever is non-returning. You can configure how much evidence the Non-Returning Functions - Discovered analyzer requires before deciding a that function is non-returning via Analysis → Auto Analyze ... from the Code Browser. If you lower the evidence threshold, this analyzer will mark loopForever as non-returning.
Open the file, analyse if with different settings and analyse the result.
Custom Data Types
When reversing a compiled binary, you will often encounter raw memory offsets and pointer arithmetic (e.g., *(int *)(param_1 + 0x14)). To make the decompiled output readable, you must teach the tool about the program’s original data structures.
Consider the file cdt.zip, you will map out the HardwareDevice struct, the DeviceState enum, and the NetworkIdentity union. You can accomplish this using one of two methods:
Method 1: Manual Definition (Data Type Manager)
This method is essential when you don’t have the original source code and are piecing together the structure through reverse engineering.
-
Open the Data Type Manager: Look for the “Data Type Manager” window in your Ghidra workspace (usually in the bottom left).
-
Create the Base Types:
- Right-click on your program’s name in the manager.
- Select
New -> Enum...to createDeviceState. Add the states and their integer values. - Select
New -> Union...to create NetworkIdentity. Add theipv4_addr(uint),mac_addr(byte array), anduuid(ulonglong).
-
Create the Main Struct:
- Right-click your program’s name and select
New -> Structure... - Name it
HardwareDevice
- Right-click your program’s name and select
-
Add fields at the correct offsets (
0x00,0x04,0x08, etc.) matching the sizes you observe in the assembly/decompiler. Use the Enum and Union you just created as data types for the corresponding offsets.
Method 2: Importing a C Header (.h) File
If you have successfully reconstructed the C definitions (or have access to the original headers), you can import them directly to save time. You can also use this with your own created headers, if you prefer writting headers in a .h insteand of using the Ghidra interface.
-
Create or obtain a plain text file named
structures.hcontaining your typedef struct, enum, and union definitions. -
Open the C Parser: In Ghidra, go to
File -> Parse C Source… -
Configure and Parse:
- Add your structures.h file to the
Source Files to Parselist. - Click the Parse to Program button. Ghidra will read the file and automatically populate the Data Type Manager with your custom types.
- Add your structures.h file to the
Applying the Data Type
Once Ghidra knows about your types (via Method 1 or 2), you must apply them to the decompiled code:
- Navigate to the main function or initialize_device in the Decompiler window.
- Find the generic pointer variable representing the device (e.g.,
pvVar1orparam_1). - Right-click the variable, select Retype Variable, and type
HardwareDevice *(don’t forget the asterisk for the pointer!). - Watch the decompiled code instantly transform from raw offsets to readable field names!
Alternative Method
In the scope of a decompiled function, you can also let Ghidra automatically create structures according to the offsets.
For this, just right click the on the function argument and select Auto Create Structure
Use the different methods and compare the result.
Classes and Virtual Function Tables
Analyzing compiled C++ code is fundamentally different from analyzing C due to object-oriented features like inheritance and polymorphism. With this exercise we will analyze a compiled binary to understand how classes are laid out in memory and how virtual function calls are resolved dynamically using Virtual Function Tables (vtables).
Download this file cpp.zip.
The goal is to fully reverse-engineer the Vehicle base class and the SportsCar derived class, map their internal structures in Ghidra, and trace the dynamic dispatch mechanism in the performDiagnostics function.
Start by locating the main function in the binary. Look for calls to the memory allocator (usually operator new or malloc).
Note the sizes: You should see two distinct allocation sizes. The first corresponds to the Vehicle class, and the second (larger) one corresponds to the SportsCar class.
plVar1 = operator.new(0x30); // Allocation
FUN_00400620(plVar1,"Work Van"); // Constructor
plVar2 = operator.new(0x38);
FUN_0040078e(plVar2,"Ferrari 488",0x294);
Immediately following the allocations, look for the constructor calls.
Under the hood, a C++ class is treated very similarly to a C struct. However, because this code uses virtual functions, the compiler silently adds a hidden member variable to the start of the object.
Open the Data Type Manager and create a new Structure for Vehicle.
- The
Vtable Pointer: At offset 0x0, add a pointer type (e.g.,void *). This is thevtablepointer (vptr), injected by the compiler. - Map the Members: Based on the constructor’s initialization logic, add the remaining fields:
current_speed(int) andmodel_name(char array). - Create the
SportsCarstructure: Create another structure for the derived class. Remember that in inheritance, the derived class’s memory layout starts with the base class’s layout.
Offset 0x0 to the end of the base class will look exactly like Vehicle. Following the base class data, add the new fields: turbo_enabled (bool/byte) and horsepower (int).
The vtable is simply an array of function pointers that tell the object which implementation of a virtual method to execute.
Double-click on the constructor for Vehicle to enter its implementation. Look for an instruction that writes a constant memory address into the object’s offset 0x0 (the vtable pointer you mapped earlier). Double-click this constant address. You will be taken to the .rdata or .rodata section. You should see a list of function pointers clustered together. This is the vtable!
Create a Vtable Struct: In the Data Type Manager, create a struct named Vehicle_Vtable consisting of function pointers: Destructor, startEngine, printStats.
Repeat this process for the SportsCar constructor. Notice how its vtable points to different implementations for startEngine and printStats!
Now, navigate to the performDiagnostics function (it will be called twice from main, passing the two different objects).
Apply your Vehicle * data type to the function’s parameter. Observe the decompiled C-like code. Instead of calling a function directly by name, the code will:
- Dereference the object pointer to get the vtable.
- Read a function pointer from a specific offset within that vtable.
- Execute an indirect call (e.g., call [rax + 8]).
Explain how the exact same assembly instructions in performDiagnostics are able to execute entirely different code depending on whether a Vehicle or SportsCar was passed as the argument.
Using Plugins
Consider this binary file (from the HTB Cyber Apocalypse 2021 CTF), and use it to explore ghidra. Rename functions and variables, and set proper types. It helps if you check the manual of the C functions presented there, as you can see what is the actual type (although ghidra should do it).
This is also a good example to demonstrate the use of plugins. One of the plugins in ghidra, which can be used to XOR memory. Use it to decrypt the password and obtain the flag. You just need to select the memory to XOR and run the plugin.
Calling Conventions
Use the provided samples and examine the calling conventions and their impact to the code produced. For each sample, compile it and analyze the assembly produced. You can compile the samples with the included Makefile, which will automatically produce both ELF and Assembly files for each convention.
Cracking Challenges
There are several crackmes available for you to improve your reverse engineering skills. For each one, write a small text with the conclusions obtained regarding its behavior. If you can, create an equivalent program in another programming language such as C or python.
The objective for all these crackmes is to find a code or text, and get the Correct string printed to the standard output..
Follow a structured approach to these programs:
- Identify the main structure of the program (functions)
- Go through the code and rename variables and functions to match their purpose
- Identify building blocks inside functions.
- Create representations of part of the program in another programming language (e.g, python).
- Document everything for future discussion.
Along your analysis, take nodes of your assumptions. Take in consideration that a Reverse Engineering tasks don’t produce the original design, but a limited and potentially flawed interpretation of it.
An Unknown file
This unknown file has no structure and we have no information about its architecture or purpose. What information can you extract from it?
Analyze it and discuss your ideas with the class.
With the contents already studied during these classes (static analysis), it is not possible to fully access the file and obtain its content. Still, it is a valid file for some format (file, binwalk, qemu, PDF reader), and shows how important it is to have some basic information about the architecture and structure. We will get back to it in a future lecture.
Tools and links
- ghidra: https://ghidra-sre.org/
- objdump: https://man7.org/linux/man-pages/man1/objdump.1.html
- readelf: https://man7.org/linux/man-pages/man1/readelf.1.html
- LIEF: https://lief-project.github.io/
- gnutls:https://gnutls.org/documentation.html
- HxD: https://mh-nexus.de/en/hxd/
- bvi: http://bvi.sourceforge.net/
- ImHex: https://github.com/WerWolv/ImHex
- HexWorkshop: http://www.hexworkshop.com/
- ghex: https://wiki.gnome.org/Apps/Ghex
- HexEdit: https://hexed.it/
- FileInsight: https://github.com/nmantani/FileInsight-plugins