Binary Executable Files

Lecture Notes

Analyzing Binary Executable files such as ELF and PE. Focus on structure, symbols and differences between languages and compilers..

Download here

Practical Tasks

Exercise 1

Compile a small C program with 1-2 function, and using objdump, analyze the output of object created as well as the final binary.

As a possible C program you may consider:

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

char* buffer = "Hello World\n";

void bar(void) {
        int fd = open("hello.txt", O_CREAT | O_WRONLY);
        write(fd, buffer, strlen(buffer));
        close(fd);
}

void foo(void){
        printf("%s", buffer);
}

int main(int argc, char **argv){
        foo();
        bar();
        return 0;
}

After the program is compiled (gcc -o prog prog.c) you can use objdump to inspect it. To obtain the sections you can use objdump -h prog and the result should be something like:

objdump -h prog

prog:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       0000001c  0000000000000318  0000000000000318  00000318  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.gnu.property 00000020  0000000000000338  0000000000000338  00000338  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.gnu.build-id 00000024  0000000000000358  0000000000000358  00000358  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .note.ABI-tag 00000020  000000000000037c  000000000000037c  0000037c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .gnu.hash     00000024  00000000000003a0  00000000000003a0  000003a0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynsym       00000108  00000000000003c8  00000000000003c8  000003c8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .dynstr       000000a7  00000000000004d0  00000000000004d0  000004d0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version  00000016  0000000000000578  0000000000000578  00000578  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .gnu.version_r 00000030  0000000000000590  0000000000000590  00000590  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .rela.dyn     000000d8  00000000000005c0  00000000000005c0  000005c0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .rela.plt     00000078  0000000000000698  0000000000000698  00000698  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 11 .init         00000017  0000000000001000  0000000000001000  00001000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .plt          00000060  0000000000001020  0000000000001020  00001020  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .plt.got      00000008  0000000000001080  0000000000001080  00001080  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .text         0000017b  0000000000001090  0000000000001090  00001090  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 15 .fini         00000009  000000000000120c  000000000000120c  0000120c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 16 .rodata       0000001b  0000000000002000  0000000000002000  00002000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 17 .eh_frame_hdr 0000003c  000000000000201c  000000000000201c  0000201c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 18 .eh_frame     000000ec  0000000000002058  0000000000002058  00002058  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 19 .init_array   00000008  0000000000003dd0  0000000000003dd0  00002dd0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 20 .fini_array   00000008  0000000000003dd8  0000000000003dd8  00002dd8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 21 .dynamic      000001e0  0000000000003de0  0000000000003de0  00002de0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 22 .got          00000028  0000000000003fc0  0000000000003fc0  00002fc0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 23 .got.plt      00000040  0000000000003fe8  0000000000003fe8  00002fe8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 24 .data         00000018  0000000000004028  0000000000004028  00003028  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 25 .bss          00000008  0000000000004040  0000000000004040  00003040  2**0
                  ALLOC
 26 .comment      0000001e  0000000000000000  0000000000000000  00003040  2**0
                  CONTENTS, READONLY

You can also obtain the symbols present in the binary using objdump -tT prog, which will show the symbol table and the dynamic symbol table.

$ objdump -tT prog

prog:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*          0000000000000000              Scrt1.o
000000000000037c l     O .note.ABI-tag  0000000000000020              __abi_tag
0000000000000000 l    df *ABS*          0000000000000000              crtstuff.c
00000000000010c0 l     F .text          0000000000000000              deregister_tm_clones
00000000000010f0 l     F .text          0000000000000000              register_tm_clones
0000000000001130 l     F .text          0000000000000000              __do_global_dtors_aux
0000000000004040 l     O .bss           0000000000000001              completed.0
0000000000003dd8 l     O .fini_array    0000000000000000              __do_global_dtors_aux_fini_array_entry
0000000000001170 l     F .text          0000000000000000              frame_dummy
0000000000003dd0 l     O .init_array    0000000000000000              __frame_dummy_init_array_entry
0000000000000000 l    df *ABS*          0000000000000000              prog.c
0000000000000000 l    df *ABS*          0000000000000000              crtstuff.c
0000000000002140 l     O .eh_frame      0000000000000000              __FRAME_END__
0000000000000000 l    df *ABS*          0000000000000000
0000000000003de0 l     O .dynamic       0000000000000000              _DYNAMIC
000000000000201c l       .eh_frame_hdr  0000000000000000              __GNU_EH_FRAME_HDR
0000000000003fe8 l     O .got.plt       0000000000000000              _GLOBAL_OFFSET_TABLE_
0000000000000000       F *UND*          0000000000000000              __libc_start_main@GLIBC_2.34
0000000000000000  w      *UND*          0000000000000000              _ITM_deregisterTMCloneTable
0000000000004028  w      .data          0000000000000000              data_start
0000000000000000       F *UND*          0000000000000000              write@GLIBC_2.2.5
0000000000004040 g       .data          0000000000000000              _edata
0000000000001179 g     F .text          0000000000000057              bar
000000000000120c g     F .fini          0000000000000000              .hidden _fini
0000000000000000       F *UND*          0000000000000000              strlen@GLIBC_2.2.5
0000000000000000       F *UND*          0000000000000000              printf@GLIBC_2.2.5
0000000000000000       F *UND*          0000000000000000              close@GLIBC_2.2.5
0000000000004038 g     O .data          0000000000000008              buffer
0000000000004028 g       .data          0000000000000000              __data_start
0000000000000000  w      *UND*          0000000000000000              __gmon_start__
0000000000004030 g     O .data          0000000000000000              .hidden __dso_handle
0000000000002000 g     O .rodata        0000000000000004              _IO_stdin_used
00000000000011d0 g     F .text          000000000000001b              foo
0000000000004048 g       .bss           0000000000000000              _end
0000000000001090 g     F .text          0000000000000022              _start
0000000000004040 g       .bss           0000000000000000              __bss_start
00000000000011eb g     F .text          0000000000000020              main
0000000000000000       F *UND*          0000000000000000              open@GLIBC_2.2.5
0000000000004040 g     O .data          0000000000000000              .hidden __TMC_END__
0000000000000000  w      *UND*          0000000000000000              _ITM_registerTMCloneTable
0000000000000000  w    F *UND*          0000000000000000              __cxa_finalize@GLIBC_2.2.5
0000000000001000 g     F .init          0000000000000000              .hidden _init


DYNAMIC SYMBOL TABLE:
0000000000000000      DF *UND*          0000000000000000 (GLIBC_2.34) __libc_start_main
0000000000000000  w   D  *UND*          0000000000000000  Base        _ITM_deregisterTMCloneTable
0000000000000000      DF *UND*          0000000000000000 (GLIBC_2.2.5) write
0000000000000000      DF *UND*          0000000000000000 (GLIBC_2.2.5) strlen
0000000000000000      DF *UND*          0000000000000000 (GLIBC_2.2.5) printf
0000000000000000      DF *UND*          0000000000000000 (GLIBC_2.2.5) close
0000000000000000  w   D  *UND*          0000000000000000  Base        __gmon_start__
0000000000000000      DF *UND*          0000000000000000 (GLIBC_2.2.5) open
0000000000000000  w   D  *UND*          0000000000000000  Base        _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*          0000000000000000 (GLIBC_2.2.5) __cxa_finalize

The full contents of this tool are omitted. Run it and determine:

  • How many sections are present?
  • How many symbols are present?

Strip the binary and repeat the same process with objdump. Then compare both results.

In particular, answer:

  • What happened to symbols?
  • What happened to function names, and functions?
  • What happened to the file size?

Exercise 2

Compile a small C program with 1-2 functions and external libraries. As an example, you can consider a program that creates a thread (libpthread) or compresses a file (libz). Any other function is adequate, as long as they are from external libraries.

One example is present at the zlib repository: https://raw.githubusercontent.com/madler/zlib/master/test/example.c You can compile this code with gcc -o example example.c -lz.

Using a Hex editor, identify the magic values of an ELF, and the values of its header. You can use readelf to guide you by presenting the values that you can find in the hex editor.

$ readelf -h example

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1270
  Start of program headers:          64 (bytes into file)
  Start of section headers:          20368 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 30

Using readelf, process the file, and identify the main sections (readelf -S) and its content. The following snippet show the content for section 25, which is the .data section. It will contain global tables and global variables.

$ readelf -x 25 example

Hex dump of section '.data':
  0x00005130 00000000 00000000 38510000 00000000 ........8Q......
  0x00005140 68656c6c 6f2c2068 656c6c6f 21000000 hello, hello!...
  0x00005150 5a310000 00000000                   Z1......

Create a diagram (drawing) of the binary file and represent its structure from the perspective of the ELF structure, a segment view, and a section view. It is important to understand which parts of the ELF file are actually loaded into segments, and where they will be placed in the memory. The structure is important to analyze to see how the bytes map to segments and sections.

Exercise 3

The LIEF library allows extensive manipulation of binary files, including ELF objects. Using LIEF, make a small python script that prints information about an ELF, that may be relevant for your future reverse engineering tasks.

In particular, determine:

  • The type of file and architecture
  • The list of libraries loaded
  • The compiler used
  • The list of symbols from external libraries
  • The address of the program entry point
  • Information whether the program is using RELRO, PIE, and Canaries
  • Information whether the program is stripped

Exercise 5

The following exercises are a set of crackmes from crackmes.one. All of them have at least one writeup, and are supposed to be easish… They also cover different platforms and languages, allowing you to experiment different approaches. Content is not mirrored here, due to mirroring restrictions from the crackmes.one website.

The objective to all crackmes is to reverse the key verification algorithm and then write a keygen that produces valid keys. Always run the crackmes on Virtual Machines without personal data or shared folders!

You will require a Windows Virtual Machine. Start by downloading the ISO from here: https://www.microsoft.com/pt-pt/software-download/windows10iso

easy keyg3me

URL: https://crackmes.one/crackme/5da31ebc33c5d46f00e2c661

A simple keygen for Linux in AMD64 written by ezman. It should be easy. The fundamental hurdle is that C compilation is inherently destructive. Unlike modern languages that compile down to metadata-rich intermediate bytecode, C is compiled directly into raw, architecture-specific machine code, permanently stripping away the context that made it human-readable in the first place.

Stripped Symbols and Type Information: In a standard release build, all variable names, function names, comments, and data structures are completely deleted to save space. Instead of seeing readable logic like User.Age = 25, you are forced to look at raw hexadecimal values being moved into memory offsets relative to a CPU register (e.g., mov dword ptr [rax+14h], 19h). You have to reverse-engineer the original data structures purely by observing how the program interacts with that memory.

Aggressive Compiler Optimizations: Compilers (like GCC, Clang, or MSVC) ruthlessly optimize code for speed and size, which heavily warps the original logic. They might inline a function (pasting its code directly into the caller rather than jumping to it), unroll loops into long chains of repetitive instructions, or completely reorder the execution path. This means the control flow you see in a disassembler rarely matches the neat, linear logic the developer originally typed out.

The process to follow is the following:

Reconnaissance and Triage: You should never dive straight into the raw assembly code. Begin by examining the binary’s external characteristics using tools like file, strings, or PE/ELF header parsers. Extracting human-readable text and analyzing the Import Address Table (IAT)—which lists the external operating system functions the program uses, such as WriteFile or socket—gives you a massive head start in understanding exactly what the binary is capable of before you read a single line of code.

Decompilation and Structural Reconstruction: Next, load the binary into a robust disassembler and decompiler suite like Ghidra or IDA Pro. Locate the true main function and start working backward from the known API calls you found in Step 1 to figure out what the surrounding, unlabeled functions do. As you decipher the logic, aggressively rename variables and manually define C-structs in your decompiler to format the raw memory accesses back into readable pseudo-code.

Dynamic Verification: Static reading only tells you what the code might do. To confirm your theories, run the binary in an isolated environment using a debugger (like GDB for Linux or x64dbg for Windows). Set breakpoints on confusing functions to observe the CPU registers and memory state live while the program executes. This iterative process of reading, testing live, and renaming gradually uncovers the program’s true intent. W

Simple Keygen

URL: https://crackmes.one/crackme/5c2acb8933c5d46a3882b8d4

A simple keygen for Linux in AMD64 written by Yuri. Follow the previous tips and solve it.

Keygen me Quick!

URL: https://crackmes.one/crackme/60d65d0833c5d410b8843014

This is simple C/C++ application for Windows made by Legacyy. You can reverse it in Linux, as it is simple, but running the binary will require Wine or a Windows VM. Standard tools aimed for binary files will work. There is one issue: Finding the actual code entry point, named WinMain.

It is tricky because the Entry Point of a compiled C/C++ binary is almost never the code the developer actually wrote. Instead, the compiler injects a hidden block of Microsoft setup code called the C Runtime (CRT) Startup. This boilerplate code initializes memory, sets up security, and prepares the environment before it ever passes control to the actual program, meaning you have to manually bypass this setup to find the real starting line.

To skip this setup and find WinMain, start at the binary’s Entry Point in your disassembler and scroll down looking for a specific cluster of Windows API calls. Because WinMain requires specific launch details from the operating system, the CRT setup must gather them first. You will almost always see calls to GetModuleHandle, GetCommandLine, and GetStartupInfo closely grouped together, signaling that the setup is almost complete.

Just past those API calls, look for a function that terminates the program, usually an exit() or exit_process call. The function executed immediately before this exit is the developer’s true WinMain. You can verify you have found the right function because it will be passed exactly four specific arguments (like the command line string and window state) right before it is called, making it clearly stand out from the rest of the setup routines.

Take asn an example, the following ending of the PE entry point:

      }
      thunk_FUN_0042c337();
      thunk_FUN_0042cc9c();
      thunk_FUN_0042cc96();
      unaff_ESI = thunk_FUN_00406f40(); //<----- WinMain
      uVar6 = thunk_FUN_00407fab();
      if ((char)uVar6 != '\0') {
        if (!bVar2) {
          __cexit();
        }
        ___scrt_uninitialize_crt('\x01','\0');
        *(undefined4 *)(unaff_EBP + -4) = 0xfffffffe;
LAB_0040737a:
        ExceptionList = *(void **)(unaff_EBP + -0x10);
        return unaff_ESI;
      }
      goto LAB_00407391;
    }
  }
  ___scrt_fastfail();
LAB_00407391:
  _exit(unaff_ESI); // WinMain is returned
}

.net_crackme_no.1

URL: https://crackmes.one/crackme/5ab77f6533c5d40ad448cb9f

This is a .NET application created by subminking for Windows. You may try run it with Mono in Linux or other operating system.

You can also try to open it with ghidra or IDA but the result will not be that useful. The reasons are that Ghidra and IDA are primarily designed to disassemble and decompile native machine code (like x86, x64, ARM) back into C or C++ pseudocode. Standard .NET applications do not compile directly to machine code. Instead, they compile to CIL (Common Intermediate Language), also known as MSIL. This intermediate code is executed by the .NET runtime (CLR) using Just-In-Time (JIT) compilation.

Moreoever, when IDA or Ghidra look at a .NET binary, they are trying to apply native analysis paradigms to it. While they might have plugins to read IL, their decompilers are meant to output C-like syntax, not C#. They miss out on accurately translating highly specific .NET constructs (like LINQ, async/await state machines, or properties).

Also, .NET binaries are essentially open books. They contain massive amounts of metadata, including the original names of classes, methods, variables, and complex object hierarchies. Dedicated .NET decompilers use this metadata to reconstruct the source code almost exactly as it was written. Ghidra and IDA often fail to leverage this seamlessly, resulting in a clunky, overly complex analysis.

If a .NET application is compiled using Native AOT (Ahead-of-Time) compilation, or if it is wrapped in a heavy native packer/obfuscator that executes native code before loading the .NET payload, Ghidra and IDA suddenly become highly useful again.

The following tools will be better suited:

smokefx_v1

URL: https://crackmes.one/crackme/5ab77f5433c5d40ad448c1c3

This is simple Borland Delphi application for Windows made by waynemodz. Borland Delphi compiles directly to native machine code (x86 or x64). This means tools like Ghidra and IDA Pro are absolutely the right tools for the job.

However, Delphi does not play by the standard rules of C or C++ compilers. If you load a Delphi binary into IDA or Ghidra without preparing it first, the decompiled output will look like complete garbage.

Here is what you need to be highly aware of when reverse engineering Borland Delphi binaries, and how to tame them.

  1. The Notorious Borland __fastcall Convention This is the number one reason reverse engineers get stuck on Delphi. By default, standard C/C++ uses calling conventions where arguments are pushed to the stack (like cdecl or stdcall) or uses specific registers (like Microsoft’s fastcall). Borland Delphi uses its own proprietary register calling convention (often referred to as Borland __fastcall).It passes the first three parameters left-to-right in the EAX, EDX, and ECX registers. Any remaining parameters are pushed onto the stack.

If Ghidra or IDA assumes a standard Microsoft fastcall or cdecl, your decompiled pseudo-code will show missing arguments, wrong variable assignments, and broken logic. You must manually force your decompiler to use the correct calling convention.

  1. The Quirky String Implementation Delphi handles strings (AnsiString, WideString, UnicodeString) very differently than C-style null-terminated strings: Delphi strings are reference-counted and length-prefixed.

The pointer to the string points directly to the first character. However, if you look at the memory address immediately before the pointer (at offset -4), you will find the string’s length. At offset -8, you will find the reference count.

Standard decompilers don’t know to look backward in memory for string metadata. You will often see complex, hidden memory management functions being called every time a string is manipulated or concatenated (e.g., calls to @LStrCatN).

  1. The VCL (Visual Component Library) and RTTI Delphi applications are heavily object-oriented and rely on the VCL for their graphical interfaces. The good news is that Delphi binaries are packed with RTTI (Run-Time Type Information). This means original class names, form names, and event handlers (like Button1Click or FormCreate) are often stored in plain text inside the binary.

Delphi also embeds .dfm (Delphi Form) resources, which contain the entire layout of the user interface and how buttons map to specific memory addresses.

To successfully reverse a Delphi binary, you should not use Ghidra or IDA alone. You need to pair them with Delphi-specific tools to recover the metadata first.

  • Ghidra + Dhrake: Because IDR’s interface is a bit dated and doesn’t have a modern decompiler, the modern workflow is to use a Ghidra script called Dhrake. You use IDR to extract all the symbols and generate a script, then load the binary into Ghidra and run Dhrake. Dhrake imports the IDR symbols and automatically fixes the Borland __fastcall signatures and string functions.

  • IDR (Interactive Delphi Reconstructor): This is the holy grail of Delphi reversing. It is a standalone tool designed specifically to parse Delphi binaries. It will extract the embedded forms, resolve the RTTI, and recover the names of thousands of functions.

  • IDA also processes Debug information (TDS) and RTTI better in some cases. Try it.

Gopher Secrets

URL: https://crackmes.one/crackme/6955400d24bfc29298329efc

This application is multiplatform, and written by sarahmills. The issue is that it is a Go application… Go (Golang) binaries are notoriously painful for analysts to dissect. Go fundamentally ignores the traditional C/C++ compilation rules that standard disassemblers rely on, meaning you cannot just drop a file into your toolkit and expect readable output.

Statically Linked Bloat: Unlike traditional programs that dynamically link to operating system libraries, Go statically bundles its entire runtime, garbage collector, and standard packages directly into the final executable. This inflates even the simplest malware beacon into a massive multi-megabyte binary containing thousands of built-in functions, forcing you to hunt for the author’s actual logic hidden deep within a haystack of irrelevant system code.

Unconventional Strings: Go does not use standard C-style null-terminated strings to mark where a piece of text ends. Instead, it concatenates all text into one massive, contiguous memory block and references specific words using string slices, a structure containing a pointer to the start of the text and an exact length value. Standard decompilers completely fail to parse these natively, leaving your analysis riddled with broken, unreadable string references.

Stripped Metadata and Obfuscation: While unstripped Go binaries actually contain highly useful Runtime Type Information (RTTI), modern malware authors aggressively strip this data and use Go-specific obfuscators like Garble. These tools intentionally randomize function names, scramble package structures, and mutate the control flow, effectively blinding standard static analysis attempts.

In order to address Go binaries follow this:

Your first workflow step must be running specialized parsing tools such as GoReSym, ghostrings, or dedicated AlphaGolang scripts for your disassembler to automatically rebuild the string slices, recover the RTTI, and identify the standard library functions so you can filter out the noise.

Once the binary is cleaned and the system functions are hidden, immediately search for the main.main function, which serves as the true starting point of the developer’s custom code. Because static analysis of Go’s complex concurrency (Goroutines) can be visually overwhelming, you should focus on mapping out key network APIs (like net.Dial) and maybe static reading will also require dynamic debugging in a sandbox to watch the payload execute.

SirCrackaLot v2

URL: https://crackmes.one/crackme/69a07e02fb7f76ef92045c40

Back to Linux, but with a Rust crackme by Re_1333333337. Reverse engineering Rust binaries is notoriously painful. If you have ever loaded a compiled Rust program into a disassembler and felt like you were staring at a chaotic, incomprehensible mess, you are not alone. In the malware analysis and reverse engineering communities, Rust is widely considered one of the most hostile languages to analyze. Because Rust guarantees memory safety and zero-cost abstractions at compile-time, the compiler does a massive amount of heavy lifting that fundamentally transforms the code.

Rust does not play by the traditional rules of C or C++ compilation. When you reverse Rust, you are fighting the compiler’s optimizations just as much as the original author’s logic.

Static Linking by Default (The Haystack Problem): Rust statically links its entire standard library and all third-party dependencies (called “crates”) into the final executable. A simple “Hello World” or a basic network beacon can easily be several megabytes in size and contain thousands of functions. Finding the actual author’s business logic is like finding a needle in a haystack.

Aggressive LLVM Optimizations: Rust relies heavily on LLVM for its backend compilation. It aggressively uses Link-Time Optimization (LTO) and function inlining. This means clear function boundaries are often destroyed, dead code is stripped, and the control flow graph becomes highly non-linear and obscured.

Zero-Cost Abstractions & Monomorphization: High-level, beautiful Rust features like iterators, closures, and async/await state machines compile down into incredibly convoluted, low-level machine code. Furthermore, Rust uses “monomorphization” for generics—meaning if a developer uses a generic function with three different data types, the compiler generates three separate, highly optimized assembly versions of that same function, bloating the binary further.

“Fat Pointers” and Non-Terminated Strings:Unlike C-style strings that end with a null terminator (\0), Rust strings are not null-terminated. They are stored in read-only memory as one massive, contiguous block of text. Variables reference these strings using “fat pointers”—a struct containing a pointer to the start of the string and the exact length of the string. Traditional decompilers looking for null terminators will completely fail to parse these, leaving you with broken string references.

Complex Error Handling (Option/Result): Rust does not use traditional try/catch exceptions. It uses Result and Option enums. At the assembly level, this results in endless chains of branching logic, bounds checking, and unwrap calls that clutter the decompiler’s pseudo-code with noise.

If you just drop a Rust binary into IDA Pro or Ghidra and hit “analyze,” you will waste days sifting through standard library garbage.

  1. Triage and Metadata Extraction Before opening a disassembler, extract the binary’s metadata. Rust binaries often leak the exact compiler version, the commit hash, and a list of all third-party crates used.

Use tools like rustbininfo to extract this data. Knowing the exact Rust version and the dependencies (e.g., the tokio crate for async networking, or reqwest for HTTP) immediately tells you the program’s capabilities before you even read the assembly.

  1. Isolate the Library Code (Library Matching) You must filter out the standard library and external crates so you can focus on the developer’s actual code.

Use advanced pattern-matching tools designed for Rust, such as the open-source RIFT suite (developed to identify and annotate library code in Rust binaries) or generate specific FLIRT signatures for the exact Rust compiler version you identified in Step 1. Apply these to your disassembler to automatically rename and hide thousands of irrelevant functions.

  1. Fix the Strings and Fat Pointers Do not try to read the code without fixing the strings first.

Use Rust-specific scripts for your disassembler (there are many publicly available Python scripts for Ghidra and IDA Pro) that specifically hunt for fat pointer patterns. These scripts will carve out the strings based on their pointer+length structures and apply them as comments in the assembly, making the binary infinitely more readable.

  1. Follow the Breadcrumbs (Panic Handlers) Because Rust is memory-safe, it includes “panic” handlers that trigger when the program encounters an unrecoverable error (like an out-of-bounds array read).

Search for the core::panic::Location struct. Even in stripped binaries, developers often leave panic strings intact. These panic strings frequently leak the original source code file names, line numbers, and variable names. Tracing a panic string back to its referencing function is the fastest way to map out the original source code tree.

  1. AI-Assisted Decompilation Because the LLVM-optimized pseudo-C generated by IDA or Ghidra will look very foreign, the modern workflow heavily relies on AI assistance.

Use local LLM plugins (like GhidraAssist) that enhance Ghidra with local models (through Ollama) or remote models. You can feed them a confusing, inlined chunk of Rust assembly and ask them to summarize the high-level logic, suggest variable names, and identify underlying Rust structures like enum matches or async polls.

Previous
Next