Static Analysis of Applications

Lecture Notes

Analyzing Binary Executable files. Focus on static analysis, calling conventions and decompilation.

Download here

Practical Tasks

NoReturn Functions

The noReturn.zip is an example program from the Ghidra documentation, demonstrating the impact of of functions without return.

The function loopForever is non-returning. You can configure how much evidence the Non-Returning Functions - Discovered analyzer requires before deciding a that function is non-returning via Analysis → Auto Analyze ... from the Code Browser. If you lower the evidence threshold, this analyzer will mark loopForever as non-returning.

Open the file, analyse if with different settings and analyse the result.

Custom Data Types

When reversing a compiled binary, you will often encounter raw memory offsets and pointer arithmetic (e.g., *(int *)(param_1 + 0x14)). To make the decompiled output readable, you must teach the tool about the program’s original data structures.

Consider the file cdt.zip, you will map out the HardwareDevice struct, the DeviceState enum, and the NetworkIdentity union. You can accomplish this using one of two methods:

Method 1: Manual Definition (Data Type Manager)

This method is essential when you don’t have the original source code and are piecing together the structure through reverse engineering.

  • Open the Data Type Manager: Look for the “Data Type Manager” window in your Ghidra workspace (usually in the bottom left).

  • Create the Base Types:

    • Right-click on your program’s name in the manager.
    • Select New -> Enum... to create DeviceState. Add the states and their integer values.
    • Select New -> Union... to create NetworkIdentity. Add the ipv4_addr (uint), mac_addr (byte array), and uuid (ulonglong).
  • Create the Main Struct:

    • Right-click your program’s name and select New -> Structure...
    • Name it HardwareDevice
  • Add fields at the correct offsets (0x00, 0x04, 0x08, etc.) matching the sizes you observe in the assembly/decompiler. Use the Enum and Union you just created as data types for the corresponding offsets.

Method 2: Importing a C Header (.h) File

If you have successfully reconstructed the C definitions (or have access to the original headers), you can import them directly to save time. You can also use this with your own created headers, if you prefer writting headers in a .h insteand of using the Ghidra interface.

  • Create or obtain a plain text file named structures.h containing your typedef struct, enum, and union definitions.

  • Open the C Parser: In Ghidra, go to File -> Parse C Source

  • Configure and Parse:

    • Add your structures.h file to the Source Files to Parse list.
    • Click the Parse to Program button. Ghidra will read the file and automatically populate the Data Type Manager with your custom types.

Applying the Data Type

Once Ghidra knows about your types (via Method 1 or 2), you must apply them to the decompiled code:

  • Navigate to the main function or initialize_device in the Decompiler window.
  • Find the generic pointer variable representing the device (e.g., pvVar1 or param_1).
  • Right-click the variable, select Retype Variable, and type HardwareDevice * (don’t forget the asterisk for the pointer!).
  • Watch the decompiled code instantly transform from raw offsets to readable field names!

Alternative Method

In the scope of a decompiled function, you can also let Ghidra automatically create structures according to the offsets. For this, just right click the on the function argument and select Auto Create Structure

Use the different methods and compare the result.

Classes and Virtual Function Tables

Analyzing compiled C++ code is fundamentally different from analyzing C due to object-oriented features like inheritance and polymorphism. With this exercise we will analyze a compiled binary to understand how classes are laid out in memory and how virtual function calls are resolved dynamically using Virtual Function Tables (vtables).

Download this file cpp.zip. The goal is to fully reverse-engineer the Vehicle base class and the SportsCar derived class, map their internal structures in Ghidra, and trace the dynamic dispatch mechanism in the performDiagnostics function.

Start by locating the main function in the binary. Look for calls to the memory allocator (usually operator new or malloc). Note the sizes: You should see two distinct allocation sizes. The first corresponds to the Vehicle class, and the second (larger) one corresponds to the SportsCar class.

  plVar1 = operator.new(0x30); // Allocation
  FUN_00400620(plVar1,"Work Van"); // Constructor
  plVar2 = operator.new(0x38);
  FUN_0040078e(plVar2,"Ferrari 488",0x294);

Immediately following the allocations, look for the constructor calls.

Under the hood, a C++ class is treated very similarly to a C struct. However, because this code uses virtual functions, the compiler silently adds a hidden member variable to the start of the object.

Open the Data Type Manager and create a new Structure for Vehicle.

  • The Vtable Pointer: At offset 0x0, add a pointer type (e.g., void *). This is the vtable pointer (vptr), injected by the compiler.
  • Map the Members: Based on the constructor’s initialization logic, add the remaining fields: current_speed (int) and model_name (char array).
  • Create the SportsCar structure: Create another structure for the derived class. Remember that in inheritance, the derived class’s memory layout starts with the base class’s layout.

Offset 0x0 to the end of the base class will look exactly like Vehicle. Following the base class data, add the new fields: turbo_enabled (bool/byte) and horsepower (int).

The vtable is simply an array of function pointers that tell the object which implementation of a virtual method to execute.

Double-click on the constructor for Vehicle to enter its implementation. Look for an instruction that writes a constant memory address into the object’s offset 0x0 (the vtable pointer you mapped earlier). Double-click this constant address. You will be taken to the .rdata or .rodata section. You should see a list of function pointers clustered together. This is the vtable!

Create a Vtable Struct: In the Data Type Manager, create a struct named Vehicle_Vtable consisting of function pointers: Destructor, startEngine, printStats.

Repeat this process for the SportsCar constructor. Notice how its vtable points to different implementations for startEngine and printStats!

Now, navigate to the performDiagnostics function (it will be called twice from main, passing the two different objects). Apply your Vehicle * data type to the function’s parameter. Observe the decompiled C-like code. Instead of calling a function directly by name, the code will:

  • Dereference the object pointer to get the vtable.
  • Read a function pointer from a specific offset within that vtable.
  • Execute an indirect call (e.g., call [rax + 8]).

Explain how the exact same assembly instructions in performDiagnostics are able to execute entirely different code depending on whether a Vehicle or SportsCar was passed as the argument.

Using Plugins

Consider this binary file (from the HTB Cyber Apocalypse 2021 CTF), and use it to explore ghidra. Rename functions and variables, and set proper types. It helps if you check the manual of the C functions presented there, as you can see what is the actual type (although ghidra should do it).

This is also a good example to demonstrate the use of plugins. One of the plugins in ghidra, which can be used to XOR memory. Use it to decrypt the password and obtain the flag. You just need to select the memory to XOR and run the plugin.

Calling Conventions

Use the provided samples and examine the calling conventions and their impact to the code produced. For each sample, compile it and analyze the assembly produced. You can compile the samples with the included Makefile, which will automatically produce both ELF and Assembly files for each convention.

Cracking Challenges

There are several crackmes available for you to improve your reverse engineering skills. For each one, write a small text with the conclusions obtained regarding its behavior. If you can, create an equivalent program in another programming language such as C or python.

The objective for all these crackmes is to find a code or text, and get the Correct string printed to the standard output..

Follow a structured approach to these programs:

  • Identify the main structure of the program (functions)
  • Go through the code and rename variables and functions to match their purpose
  • Identify building blocks inside functions.
  • Create representations of part of the program in another programming language (e.g, python).
  • Document everything for future discussion.

Along your analysis, take nodes of your assumptions. Take in consideration that a Reverse Engineering tasks don’t produce the original design, but a limited and potentially flawed interpretation of it.

An Unknown file

This unknown file has no structure and we have no information about its architecture or purpose. What information can you extract from it?

Analyze it and discuss your ideas with the class.

With the contents already studied during these classes (static analysis), it is not possible to fully access the file and obtain its content. Still, it is a valid file for some format (file, binwalk, qemu, PDF reader), and shows how important it is to have some basic information about the architecture and structure. We will get back to it in a future lecture.

Previous
Next