Lecture Notes
Analyzing Binary Executable files such as ELF and PE. Focus on structure, symbols and differences between languages and compilers..
Download here
Practical Tasks
Exercise 1
Compile a small C program with 1-2 function, and using objdump, analyze the output of object created as well as the final binary.
As a possible C program you may consider:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
char* buffer = "Hello World\n";
void bar(void) {
int fd = open("hello.txt", O_CREAT | O_WRONLY);
write(fd, buffer, strlen(buffer));
close(fd);
}
void foo(void){
printf("%s", buffer);
}
int main(int argc, char **argv){
foo();
bar();
return 0;
}
After the program is compiled (gcc -o prog prog.c) you can use objdump to inspect it.
To obtain the sections you can use objdump -h prog and the result should be something like:
objdump -h prog
prog: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 0000000000000318 0000000000000318 00000318 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.gnu.property 00000020 0000000000000338 0000000000000338 00000338 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.gnu.build-id 00000024 0000000000000358 0000000000000358 00000358 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .note.ABI-tag 00000020 000000000000037c 000000000000037c 0000037c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .gnu.hash 00000024 00000000000003a0 00000000000003a0 000003a0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynsym 00000108 00000000000003c8 00000000000003c8 000003c8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .dynstr 000000a7 00000000000004d0 00000000000004d0 000004d0 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version 00000016 0000000000000578 0000000000000578 00000578 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .gnu.version_r 00000030 0000000000000590 0000000000000590 00000590 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.dyn 000000d8 00000000000005c0 00000000000005c0 000005c0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .rela.plt 00000078 0000000000000698 0000000000000698 00000698 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
11 .init 00000017 0000000000001000 0000000000001000 00001000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt 00000060 0000000000001020 0000000000001020 00001020 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .plt.got 00000008 0000000000001080 0000000000001080 00001080 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .text 0000017b 0000000000001090 0000000000001090 00001090 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .fini 00000009 000000000000120c 000000000000120c 0000120c 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
16 .rodata 0000001b 0000000000002000 0000000000002000 00002000 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
17 .eh_frame_hdr 0000003c 000000000000201c 000000000000201c 0000201c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
18 .eh_frame 000000ec 0000000000002058 0000000000002058 00002058 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
19 .init_array 00000008 0000000000003dd0 0000000000003dd0 00002dd0 2**3
CONTENTS, ALLOC, LOAD, DATA
20 .fini_array 00000008 0000000000003dd8 0000000000003dd8 00002dd8 2**3
CONTENTS, ALLOC, LOAD, DATA
21 .dynamic 000001e0 0000000000003de0 0000000000003de0 00002de0 2**3
CONTENTS, ALLOC, LOAD, DATA
22 .got 00000028 0000000000003fc0 0000000000003fc0 00002fc0 2**3
CONTENTS, ALLOC, LOAD, DATA
23 .got.plt 00000040 0000000000003fe8 0000000000003fe8 00002fe8 2**3
CONTENTS, ALLOC, LOAD, DATA
24 .data 00000018 0000000000004028 0000000000004028 00003028 2**3
CONTENTS, ALLOC, LOAD, DATA
25 .bss 00000008 0000000000004040 0000000000004040 00003040 2**0
ALLOC
26 .comment 0000001e 0000000000000000 0000000000000000 00003040 2**0
CONTENTS, READONLY
You can also obtain the symbols present in the binary using objdump -tT prog, which will show the symbol table and the dynamic symbol table.
$ objdump -tT prog
prog: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 Scrt1.o
000000000000037c l O .note.ABI-tag 0000000000000020 __abi_tag
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
00000000000010c0 l F .text 0000000000000000 deregister_tm_clones
00000000000010f0 l F .text 0000000000000000 register_tm_clones
0000000000001130 l F .text 0000000000000000 __do_global_dtors_aux
0000000000004040 l O .bss 0000000000000001 completed.0
0000000000003dd8 l O .fini_array 0000000000000000 __do_global_dtors_aux_fini_array_entry
0000000000001170 l F .text 0000000000000000 frame_dummy
0000000000003dd0 l O .init_array 0000000000000000 __frame_dummy_init_array_entry
0000000000000000 l df *ABS* 0000000000000000 prog.c
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000002140 l O .eh_frame 0000000000000000 __FRAME_END__
0000000000000000 l df *ABS* 0000000000000000
0000000000003de0 l O .dynamic 0000000000000000 _DYNAMIC
000000000000201c l .eh_frame_hdr 0000000000000000 __GNU_EH_FRAME_HDR
0000000000003fe8 l O .got.plt 0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000000000 F *UND* 0000000000000000 __libc_start_main@GLIBC_2.34
0000000000000000 w *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000004028 w .data 0000000000000000 data_start
0000000000000000 F *UND* 0000000000000000 write@GLIBC_2.2.5
0000000000004040 g .data 0000000000000000 _edata
0000000000001179 g F .text 0000000000000057 bar
000000000000120c g F .fini 0000000000000000 .hidden _fini
0000000000000000 F *UND* 0000000000000000 strlen@GLIBC_2.2.5
0000000000000000 F *UND* 0000000000000000 printf@GLIBC_2.2.5
0000000000000000 F *UND* 0000000000000000 close@GLIBC_2.2.5
0000000000004038 g O .data 0000000000000008 buffer
0000000000004028 g .data 0000000000000000 __data_start
0000000000000000 w *UND* 0000000000000000 __gmon_start__
0000000000004030 g O .data 0000000000000000 .hidden __dso_handle
0000000000002000 g O .rodata 0000000000000004 _IO_stdin_used
00000000000011d0 g F .text 000000000000001b foo
0000000000004048 g .bss 0000000000000000 _end
0000000000001090 g F .text 0000000000000022 _start
0000000000004040 g .bss 0000000000000000 __bss_start
00000000000011eb g F .text 0000000000000020 main
0000000000000000 F *UND* 0000000000000000 open@GLIBC_2.2.5
0000000000004040 g O .data 0000000000000000 .hidden __TMC_END__
0000000000000000 w *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w F *UND* 0000000000000000 __cxa_finalize@GLIBC_2.2.5
0000000000001000 g F .init 0000000000000000 .hidden _init
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.34) __libc_start_main
0000000000000000 w D *UND* 0000000000000000 Base _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) write
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) strlen
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) printf
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) close
0000000000000000 w D *UND* 0000000000000000 Base __gmon_start__
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) open
0000000000000000 w D *UND* 0000000000000000 Base _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 (GLIBC_2.2.5) __cxa_finalize
The full contents of this tool are omitted. Run it and determine:
- How many sections are present?
- How many symbols are present?
Strip the binary and repeat the same process with objdump. Then compare both results.
In particular, answer:
- What happened to symbols?
- What happened to function names, and functions?
- What happened to the file size?
Exercise 2
Compile a small C program with 1-2 functions and external libraries. As an example, you can consider a program that creates a thread (libpthread) or compresses a file (libz).
Any other function is adequate, as long as they are from external libraries.
One example is present at the zlib repository: https://raw.githubusercontent.com/madler/zlib/master/test/example.c
You can compile this code with gcc -o example example.c -lz.
Using a Hex editor, identify the magic values of an ELF, and the values of its header.
You can use readelf to guide you by presenting the values that you can find in the hex editor.
$ readelf -h example
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1270
Start of program headers: 64 (bytes into file)
Start of section headers: 20368 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
Using readelf, process the file, and identify the main sections (readelf -S) and its content. The following snippet show the content for section 25, which is the .data section.
It will contain global tables and global variables.
$ readelf -x 25 example
Hex dump of section '.data':
0x00005130 00000000 00000000 38510000 00000000 ........8Q......
0x00005140 68656c6c 6f2c2068 656c6c6f 21000000 hello, hello!...
0x00005150 5a310000 00000000 Z1......
Create a diagram (drawing) of the binary file and represent its structure from the perspective of the ELF structure, a segment view, and a section view. It is important to understand which parts of the ELF file are actually loaded into segments, and where they will be placed in the memory. The structure is important to analyze to see how the bytes map to segments and sections.
Exercise 3
The LIEF library allows extensive manipulation of binary files, including ELF objects. Using LIEF, make a small python script that prints information about an ELF, that may be relevant for your future reverse engineering tasks.
In particular, determine:
- The type of file and architecture
- The list of libraries loaded
- The compiler used
- The list of symbols from external libraries
- The address of the program entry point
- Information whether the program is using
RELRO,PIE, andCanaries - Information whether the program is stripped
Exercise 5
The following exercises are a set of crackmes from crackmes.one. All of them have at least one writeup, and are supposed to be easish…
They also cover different platforms and languages, allowing you to experiment different approaches.
Content is not mirrored here, due to mirroring restrictions from the crackmes.one website.
The objective to all crackmes is to reverse the key verification algorithm and then write a keygen that produces valid keys. Always run the crackmes on Virtual Machines without personal data or shared folders!
You will require a Windows Virtual Machine. Start by downloading the ISO from here: https://www.microsoft.com/pt-pt/software-download/windows10iso
easy keyg3me
URL: https://crackmes.one/crackme/5da31ebc33c5d46f00e2c661
A simple keygen for Linux in AMD64 written by ezman. It should be easy. The fundamental hurdle is that C compilation is inherently destructive. Unlike modern languages that compile down to metadata-rich intermediate bytecode, C is compiled directly into raw, architecture-specific machine code, permanently stripping away the context that made it human-readable in the first place.
Stripped Symbols and Type Information: In a standard release build, all variable names, function names, comments, and data structures are completely deleted to save space. Instead of seeing readable logic like User.Age = 25, you are forced to look at raw hexadecimal values being moved into memory offsets relative to a CPU register (e.g., mov dword ptr [rax+14h], 19h). You have to reverse-engineer the original data structures purely by observing how the program interacts with that memory.
Aggressive Compiler Optimizations: Compilers (like GCC, Clang, or MSVC) ruthlessly optimize code for speed and size, which heavily warps the original logic. They might inline a function (pasting its code directly into the caller rather than jumping to it), unroll loops into long chains of repetitive instructions, or completely reorder the execution path. This means the control flow you see in a disassembler rarely matches the neat, linear logic the developer originally typed out.
The process to follow is the following:
Reconnaissance and Triage: You should never dive straight into the raw assembly code. Begin by examining the binary’s external characteristics using tools like file, strings, or PE/ELF header parsers. Extracting human-readable text and analyzing the Import Address Table (IAT)—which lists the external operating system functions the program uses, such as WriteFile or socket—gives you a massive head start in understanding exactly what the binary is capable of before you read a single line of code.
Decompilation and Structural Reconstruction: Next, load the binary into a robust disassembler and decompiler suite like Ghidra or IDA Pro. Locate the true main function and start working backward from the known API calls you found in Step 1 to figure out what the surrounding, unlabeled functions do. As you decipher the logic, aggressively rename variables and manually define C-structs in your decompiler to format the raw memory accesses back into readable pseudo-code.
Dynamic Verification: Static reading only tells you what the code might do. To confirm your theories, run the binary in an isolated environment using a debugger (like GDB for Linux or x64dbg for Windows). Set breakpoints on confusing functions to observe the CPU registers and memory state live while the program executes. This iterative process of reading, testing live, and renaming gradually uncovers the program’s true intent. W
Simple Keygen
URL: https://crackmes.one/crackme/5c2acb8933c5d46a3882b8d4
A simple keygen for Linux in AMD64 written by Yuri. Follow the previous tips and solve it.
Keygen me Quick!
URL: https://crackmes.one/crackme/60d65d0833c5d410b8843014
This is simple C/C++ application for Windows made by Legacyy. You can reverse it in Linux, as it is simple, but running the binary will require Wine or a Windows VM. Standard tools aimed for binary files will work. There is one issue: Finding the actual code entry point, named WinMain.
It is tricky because the Entry Point of a compiled C/C++ binary is almost never the code the developer actually wrote. Instead, the compiler injects a hidden block of Microsoft setup code called the C Runtime (CRT) Startup. This boilerplate code initializes memory, sets up security, and prepares the environment before it ever passes control to the actual program, meaning you have to manually bypass this setup to find the real starting line.
To skip this setup and find WinMain, start at the binary’s Entry Point in your disassembler and scroll down looking for a specific cluster of Windows API calls. Because WinMain requires specific launch details from the operating system, the CRT setup must gather them first. You will almost always see calls to GetModuleHandle, GetCommandLine, and GetStartupInfo closely grouped together, signaling that the setup is almost complete.
Just past those API calls, look for a function that terminates the program, usually an exit() or exit_process call. The function executed immediately before this exit is the developer’s true WinMain. You can verify you have found the right function because it will be passed exactly four specific arguments (like the command line string and window state) right before it is called, making it clearly stand out from the rest of the setup routines.
Take asn an example, the following ending of the PE entry point:
}
thunk_FUN_0042c337();
thunk_FUN_0042cc9c();
thunk_FUN_0042cc96();
unaff_ESI = thunk_FUN_00406f40(); //<----- WinMain
uVar6 = thunk_FUN_00407fab();
if ((char)uVar6 != '\0') {
if (!bVar2) {
__cexit();
}
___scrt_uninitialize_crt('\x01','\0');
*(undefined4 *)(unaff_EBP + -4) = 0xfffffffe;
LAB_0040737a:
ExceptionList = *(void **)(unaff_EBP + -0x10);
return unaff_ESI;
}
goto LAB_00407391;
}
}
___scrt_fastfail();
LAB_00407391:
_exit(unaff_ESI); // WinMain is returned
}
.net_crackme_no.1
URL: https://crackmes.one/crackme/5ab77f6533c5d40ad448cb9f
This is a .NET application created by subminking for Windows. You may try run it with Mono in Linux or other operating system.
You can also try to open it with ghidra or IDA but the result will not be that useful. The reasons are that Ghidra and IDA are primarily designed to disassemble and decompile native machine code (like x86, x64, ARM) back into C or C++ pseudocode. Standard .NET applications do not compile directly to machine code. Instead, they compile to CIL (Common Intermediate Language), also known as MSIL. This intermediate code is executed by the .NET runtime (CLR) using Just-In-Time (JIT) compilation.
Moreoever, when IDA or Ghidra look at a .NET binary, they are trying to apply native analysis paradigms to it. While they might have plugins to read IL, their decompilers are meant to output C-like syntax, not C#. They miss out on accurately translating highly specific .NET constructs (like LINQ, async/await state machines, or properties).
Also, .NET binaries are essentially open books. They contain massive amounts of metadata, including the original names of classes, methods, variables, and complex object hierarchies. Dedicated .NET decompilers use this metadata to reconstruct the source code almost exactly as it was written. Ghidra and IDA often fail to leverage this seamlessly, resulting in a clunky, overly complex analysis.
If a .NET application is compiled using Native AOT (Ahead-of-Time) compilation, or if it is wrapped in a heavy native packer/obfuscator that executes native code before loading the .NET payload, Ghidra and IDA suddenly become highly useful again.
The following tools will be better suited:
- JetBrains dotPeek: https://www.jetbrains.com/decompiler/
- Mono.Cecil: https://www.mono-project.com/docs/tools+libraries/libraries/Mono.Cecil/
- ILSpy: https://github.com/icsharpcode/ILSpy
- dnSpyEx: https://github.com/dnSpyEx/dnSpy
smokefx_v1
URL: https://crackmes.one/crackme/5ab77f5433c5d40ad448c1c3
This is simple Borland Delphi application for Windows made by waynemodz. Borland Delphi compiles directly to native machine code (x86 or x64). This means tools like Ghidra and IDA Pro are absolutely the right tools for the job.
However, Delphi does not play by the standard rules of C or C++ compilers. If you load a Delphi binary into IDA or Ghidra without preparing it first, the decompiled output will look like complete garbage.
Here is what you need to be highly aware of when reverse engineering Borland Delphi binaries, and how to tame them.
- The Notorious Borland
__fastcallConvention This is the number one reason reverse engineers get stuck on Delphi. By default, standard C/C++ uses calling conventions where arguments are pushed to the stack (likecdeclorstdcall) or uses specific registers (like Microsoft’sfastcall). Borland Delphi uses its own proprietary register calling convention (often referred to as Borland__fastcall).It passes the first three parameters left-to-right in the EAX, EDX, and ECX registers. Any remaining parameters are pushed onto the stack.
If Ghidra or IDA assumes a standard Microsoft fastcall or cdecl, your decompiled pseudo-code will show missing arguments, wrong variable assignments, and broken logic. You must manually force your decompiler to use the correct calling convention.
- The Quirky String Implementation
Delphi handles strings (
AnsiString,WideString,UnicodeString) very differently than C-style null-terminated strings: Delphi strings are reference-counted and length-prefixed.
The pointer to the string points directly to the first character. However, if you look at the memory address immediately before the pointer (at offset -4), you will find the string’s length. At offset -8, you will find the reference count.
Standard decompilers don’t know to look backward in memory for string metadata. You will often see complex, hidden memory management functions being called every time a string is manipulated or concatenated (e.g., calls to @LStrCatN).
- The VCL (Visual Component Library) and RTTI
Delphi applications are heavily object-oriented and rely on the VCL for their graphical interfaces. The good news is that Delphi binaries are packed with RTTI (Run-Time Type Information). This means original class names, form names, and event handlers (like
Button1ClickorFormCreate) are often stored in plain text inside the binary.
Delphi also embeds .dfm (Delphi Form) resources, which contain the entire layout of the user interface and how buttons map to specific memory addresses.
To successfully reverse a Delphi binary, you should not use Ghidra or IDA alone. You need to pair them with Delphi-specific tools to recover the metadata first.
-
Ghidra + Dhrake: Because IDR’s interface is a bit dated and doesn’t have a modern decompiler, the modern workflow is to use a Ghidra script called Dhrake. You use IDR to extract all the symbols and generate a script, then load the binary into Ghidra and run Dhrake. Dhrake imports the IDR symbols and automatically fixes the Borland
__fastcallsignatures and string functions. -
IDR (Interactive Delphi Reconstructor): This is the holy grail of Delphi reversing. It is a standalone tool designed specifically to parse Delphi binaries. It will extract the embedded forms, resolve the RTTI, and recover the names of thousands of functions.
-
IDA also processes Debug information (TDS) and RTTI better in some cases. Try it.
Gopher Secrets
URL: https://crackmes.one/crackme/6955400d24bfc29298329efc
This application is multiplatform, and written by sarahmills. The issue is that it is a Go application… Go (Golang) binaries are notoriously painful for analysts to dissect. Go fundamentally ignores the traditional C/C++ compilation rules that standard disassemblers rely on, meaning you cannot just drop a file into your toolkit and expect readable output.
Statically Linked Bloat: Unlike traditional programs that dynamically link to operating system libraries, Go statically bundles its entire runtime, garbage collector, and standard packages directly into the final executable. This inflates even the simplest malware beacon into a massive multi-megabyte binary containing thousands of built-in functions, forcing you to hunt for the author’s actual logic hidden deep within a haystack of irrelevant system code.
Unconventional Strings: Go does not use standard C-style null-terminated strings to mark where a piece of text ends. Instead, it concatenates all text into one massive, contiguous memory block and references specific words using string slices, a structure containing a pointer to the start of the text and an exact length value. Standard decompilers completely fail to parse these natively, leaving your analysis riddled with broken, unreadable string references.
Stripped Metadata and Obfuscation: While unstripped Go binaries actually contain highly useful Runtime Type Information (RTTI), modern malware authors aggressively strip this data and use Go-specific obfuscators like Garble. These tools intentionally randomize function names, scramble package structures, and mutate the control flow, effectively blinding standard static analysis attempts.
In order to address Go binaries follow this:
Your first workflow step must be running specialized parsing tools such as GoReSym, ghostrings, or dedicated AlphaGolang scripts for your disassembler to automatically rebuild the string slices, recover the RTTI, and identify the standard library functions so you can filter out the noise.
Once the binary is cleaned and the system functions are hidden, immediately search for the main.main function, which serves as the true starting point of the developer’s custom code. Because static analysis of Go’s complex concurrency (Goroutines) can be visually overwhelming, you should focus on mapping out key network APIs (like net.Dial) and maybe static reading will also require dynamic debugging in a sandbox to watch the payload execute.
SirCrackaLot v2
URL: https://crackmes.one/crackme/69a07e02fb7f76ef92045c40
Back to Linux, but with a Rust crackme by Re_1333333337. Reverse engineering Rust binaries is notoriously painful. If you have ever loaded a compiled Rust program into a disassembler and felt like you were staring at a chaotic, incomprehensible mess, you are not alone. In the malware analysis and reverse engineering communities, Rust is widely considered one of the most hostile languages to analyze. Because Rust guarantees memory safety and zero-cost abstractions at compile-time, the compiler does a massive amount of heavy lifting that fundamentally transforms the code.
Rust does not play by the traditional rules of C or C++ compilation. When you reverse Rust, you are fighting the compiler’s optimizations just as much as the original author’s logic.
Static Linking by Default (The Haystack Problem): Rust statically links its entire standard library and all third-party dependencies (called “crates”) into the final executable. A simple “Hello World” or a basic network beacon can easily be several megabytes in size and contain thousands of functions. Finding the actual author’s business logic is like finding a needle in a haystack.
Aggressive LLVM Optimizations: Rust relies heavily on LLVM for its backend compilation. It aggressively uses Link-Time Optimization (LTO) and function inlining. This means clear function boundaries are often destroyed, dead code is stripped, and the control flow graph becomes highly non-linear and obscured.
Zero-Cost Abstractions & Monomorphization: High-level, beautiful Rust features like iterators, closures, and async/await state machines compile down into incredibly convoluted, low-level machine code. Furthermore, Rust uses “monomorphization” for generics—meaning if a developer uses a generic function with three different data types, the compiler generates three separate, highly optimized assembly versions of that same function, bloating the binary further.
“Fat Pointers” and Non-Terminated Strings:Unlike C-style strings that end with a null terminator (\0), Rust strings are not null-terminated. They are stored in read-only memory as one massive, contiguous block of text. Variables reference these strings using “fat pointers”—a struct containing a pointer to the start of the string and the exact length of the string. Traditional decompilers looking for null terminators will completely fail to parse these, leaving you with broken string references.
Complex Error Handling (Option/Result): Rust does not use traditional try/catch exceptions. It uses Result and Option enums. At the assembly level, this results in endless chains of branching logic, bounds checking, and unwrap calls that clutter the decompiler’s pseudo-code with noise.
If you just drop a Rust binary into IDA Pro or Ghidra and hit “analyze,” you will waste days sifting through standard library garbage.
- Triage and Metadata Extraction Before opening a disassembler, extract the binary’s metadata. Rust binaries often leak the exact compiler version, the commit hash, and a list of all third-party crates used.
Use tools like rustbininfo to extract this data. Knowing the exact Rust version and the dependencies (e.g., the tokio crate for async networking, or reqwest for HTTP) immediately tells you the program’s capabilities before you even read the assembly.
- Isolate the Library Code (Library Matching) You must filter out the standard library and external crates so you can focus on the developer’s actual code.
Use advanced pattern-matching tools designed for Rust, such as the open-source RIFT suite (developed to identify and annotate library code in Rust binaries) or generate specific FLIRT signatures for the exact Rust compiler version you identified in Step 1. Apply these to your disassembler to automatically rename and hide thousands of irrelevant functions.
- Fix the Strings and Fat Pointers Do not try to read the code without fixing the strings first.
Use Rust-specific scripts for your disassembler (there are many publicly available Python scripts for Ghidra and IDA Pro) that specifically hunt for fat pointer patterns. These scripts will carve out the strings based on their pointer+length structures and apply them as comments in the assembly, making the binary infinitely more readable.
- Follow the Breadcrumbs (Panic Handlers) Because Rust is memory-safe, it includes “panic” handlers that trigger when the program encounters an unrecoverable error (like an out-of-bounds array read).
Search for the core::panic::Location struct. Even in stripped binaries, developers often leave panic strings intact. These panic strings frequently leak the original source code file names, line numbers, and variable names. Tracing a panic string back to its referencing function is the fastest way to map out the original source code tree.
- AI-Assisted Decompilation Because the LLVM-optimized pseudo-C generated by IDA or Ghidra will look very foreign, the modern workflow heavily relies on AI assistance.
Use local LLM plugins (like GhidraAssist) that enhance Ghidra with local models (through Ollama) or remote models. You can feed them a confusing, inlined chunk of Rust assembly and ask them to summarize the high-level logic, suggest variable names, and identify underlying Rust structures like enum matches or async polls.
Tools and links
- objdump: https://man7.org/linux/man-pages/man1/objdump.1.html
- readelf: https://man7.org/linux/man-pages/man1/readelf.1.html
- LIEF: https://lief-project.github.io/
- gnutls:https://gnutls.org/documentation.html
- HxD: https://mh-nexus.de/en/hxd/
- bvi: http://bvi.sourceforge.net/
- ImHex: https://github.com/WerWolv/ImHex
- HexWorkshop: http://www.hexworkshop.com/
- ghex: https://wiki.gnome.org/Apps/Ghex
- HexEdit: https://hexed.it/
- FileInsight: https://github.com/nmantani/FileInsight-plugins