Obfuscation | João Paulo Barraca

Lecture Notes

Analyzing some obfuscation, anti-debugging and simple evasion techniques.

Download here

Practical Tasks

Exercise 1

Dead code can be inserted into binary files in order to make the reversing activities more complex. This is a simple technique which may introduce a little entropy to the analysis process, as additional code is present, but it is never actually executed.

Consider the following file that calculates the factorial of a number:

#include <stdio.h>
#include <stdlib.h>

unsigned long long factorial(unsigned long long a) {
    
    unsigned long long r = 1;
    
    while(a > 0 ){
        unsigned long long v = r * a;
        if(v < r){
            printf("ERROR: Overflow\n");
            exit(-1);
        }
        r = v;
        a = a - 1;
    }
    return r;
}

int main(int argc, char** argv) {
    unsigned long long v = 0;
    if(argc != 2) {
        printf("Need a positive integer argument\n");
        return -1;
    }
    v = atol(argv[1]);
    
    if(v <= 0){
        printf("Need a positive integer argument\n");
        return -1;
    }
    
    printf("Result: %llu\n", factorial(v));

    return 0;
}

The file can be compiled with:

gcc -O0 -o factorial factorial.c

Notice the use of -O0 that disables compiler optimizations. If they are enabled, dead code may be removed.

Now consider adding code that is never executed, by adding dead logic along it. In this example, the instructions are never called as we introduce a jmp to a label after them.

//...
   asm("jmp label");
   factorial(factorial(argc));
   asm("label:");
//...

Add Dead code to our example, compile it and check the result with ghidra.

Exercise 2

The previous example was rather simple and fragile. Any compiler optimization would remove the dead code and the Function Graphs would rapidly show that the code is not executed.

Lets consider adding conditional statements that are known by you (an opaque predicate), but still need to be evaluated. The objective is that dead code could be potentially reached from a static perspective, but in reality it is never reached, and you know the result of the expression.

One example is using the argc variable. As an int, argc can have negative or positive values, but in practice argc will always be equal or larger than 1. Other variables can be used to exploit this mechanism.

An example would be:

if(argc < 0 ) {
    //Dead code here
}

TASKS:

Implement a variation of the previous example using opaque predicates. The dead code should be as complex as possible, but it still needs to by syntactically valid.
Decompile the resulting files and analyze the result.

Exercise 3

The use of functions still provides some insight to reversers in regard to the structure of the program. Yet, compilers allow flattening the code, inlining all functions used. This will effectively create larger, potentially sub-optimal code, but with added confusion.

For GCC this can be used by setting the __attribute__((always_inline)) attribute in the functions we want to inline. For the factorial function, it would become unsigned long long __attribute__((always_inline)).

TASKS:

Add this to the result of the previous exercise and compare the result. If works better if you have multiple functions to insert as dead code, and you combine them.

Exercise 4

Until now our dead code makes some sense. At least it is syntactically valid C code, and is always composed by valid CPU instructions. What if it isn’t? What if the code added makes no sense, and it contains invalid opcodes?

There are several ways of achieving this. The concept is to combine opaque predicates with invalid opcodes, created with random bytes. Preprocessor tricks can be used to achieve this effect, but it will be complicated in the scope of a lecture demonstration. For a practical example, check this.

A simple way of doing it is to introduce placeholders and then fill those placeholders with junk. The placeholders can be of any instruction. In this case we will use NOP.

Consider the following macros, which will generate chunks of NOP instructions to be added.

#define REP0(X)
#define REP1(X) X
#define REP2(X) REP1(X) X
#define REP3(X) REP2(X) X
#define REP4(X) REP3(X) X
#define REP5(X) REP4(X) X
#define REP6(X) REP5(X) X
#define REP7(X) REP6(X) X
#define REP8(X) REP7(X) X
#define REP9(X) REP8(X) X
#define REP10(X) REP9(X) X

#define REP(HUNDREDS,TENS,ONES,X) \
  REP##HUNDREDS(REP10(REP10(X))) \
  REP##TENS(REP10(X)) \
  REP##ONES(X)

Then in your code you can add placeholders like this, using inline assembly. It will insert 323 NOP instruction into the code.

    __asm__ (REP(3,3,3,"nop;"));

As NOP instructions are not frequent, and 323 NOP are even less frequent or practically impossible to occur, you can post process the file to replace those instructions with random bytes.

TASKS:

Implement this strategy into the file by adding placeholders in multiple places.
Create a simple script that processes the file, replacing and large sequence of NOP (0x90) with random bytes.
- You can process the file as bytes and replace any sequence of 0x90 bytes with random bytes
Check the program executes correctly and analyze the result with ghidra.

Exercise 5

Another way of hiding code is to encrypt it, either with strong cryptographic methods, or with a simple XOR. Using a XOR is frequently a good approach as the decrypting code will be small, which will result in a smaller fingerprint to scanners.

The process consists in compiling the binary, and encrypting the code to be hidden. If we consider the factorial function as a target to hide, after we compile the program we obtain:

$ objdump -M intel -d factorial

...
0000000000001169 <factorial>:
    1169:       55                      push   rbp
    116a:       48 89 e5                mov    rbp,rsp
    116d:       48 83 ec 20             sub    rsp,0x20
    1171:       48 89 7d e8             mov    QWORD PTR [rbp-0x18],rdi
    1175:       48 c7 45 f8 01 00 00    mov    QWORD PTR [rbp-0x8],0x1
    117c:       00
    117d:       eb 3d                   jmp    11bc <factorial+0x53>
    117f:       48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
    1183:       48 0f af 45 e8          imul   rax,QWORD PTR [rbp-0x18]
    1188:       48 89 45 f0             mov    QWORD PTR [rbp-0x10],rax
    118c:       48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
    1190:       48 3b 45 f8             cmp    rax,QWORD PTR [rbp-0x8]
    1194:       73 19                   jae    11af <factorial+0x46>
    1196:       48 8d 05 6b 0e 00 00    lea    rax,[rip+0xe6b]        # 2008 <_IO_stdin_used+0x8>
    119d:       48 89 c7                mov    rdi,rax
    11a0:       e8 8b fe ff ff          call   1030 <puts@plt>
    11a5:       bf ff ff ff ff          mov    edi,0xffffffff
    11aa:       e8 b1 fe ff ff          call   1060 <exit@plt>
    11af:       48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
    11b3:       48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
    11b7:       48 83 6d e8 01          sub    QWORD PTR [rbp-0x18],0x1
    11bc:       48 83 7d e8 00          cmp    QWORD PTR [rbp-0x18],0x0
    11c1:       75 bc                   jne    117f <factorial+0x16>
    11c3:       48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
    11c7:       c9                      leave
    11c8:       c3                      ret

00000000000011c9 <main>:
...

After the file is compiled, you can use a post-processing script or even an hex editor to XOR the file between 0x1169 and 0x11c8.

At the start of the main function, or at any place before the function is used, the following code will decrypt the memory in real time.

for(void* i = factorial; (int) i <= (int) main; i++) {
    *((int*) i) = *((int*) i) ^ 0xAA;
}

TASKS:

Implement the example and evaluate the end result using static analysis. In order to obtain the actual code when reversing, we need to dump the memory after it is decrypted, or decrypt the file before analysis.
Using Qiling, implement a sandbox that dumps the process memory to a file, which can be loaded into ghidra for analysis.

Exercise 6

A parallel aspect is anti-debugging, where programs will try to detect they are being debugged and change their behaviour dynamically. This is possible because virtual machines, debuggers, emulation environments will have some side effects to the dynamic characteristics of a file.

PTRACE based debuggers, and most other debuggers can be detected. If an application detects the breakpoint, it can change the execution flow to obfuscate its real purpose. As an example, an application with a trojan malware may be interested in detecting the debugger and if found, it will not activate the malicious code. An analyst will struggle to analyze the real purpose of the application, unless the detection code is removed, avoided, or it’s result is ignored. The correct strategy will vary with how the anti-debugging techniques are actually added.

Virtual Machines can also be detected as the hardware devices identifiers, the CPU, or the drivers can provide hints that the program is not running on a bare host. Also, the actual opcodes supported by an hypervisor can differ from a real host. For a interesting overview of some techniques, check Peter Ferrie work titled Attacks on Virtual Machine Emulators.

When considering debuggers, a simple method involves an application tracing itself with PTRACE_TRACEME. While this will work, it will have consequences if real debugging is required.

if (ptrace(PTRACE_TRACEME, 0, 1, 0) == -1)  {
    printf("Debugger detected\n");
    return 1;
}

The application can also fork a child, and let the child use PTRACE_ATTACH to the parent, then detaching with PTRACE_DETACH.

int pid = fork();

if (pid == 0) { // Child
    int ppid = getppid(); // Get parent PID
    if (ptrace(PTRACE_ATTACH, ppid, NULL, NULL) == 0) {
      waitpid(ppid, NULL, 0); // Wait for parent
      ptrace(PTRACE_CONT, NULL, NULL); // Continue parent
      ptrace(PTRACE_DETACH, getppid(), NULL, NULL); // Detach
    } else {
      printf("DEBUGGER detected")
    }
} else { // Parent. Exit...}

Another method involves inspecting the content of /proc/self/status, looking for the value of TracerPid:.

As the debuggers use SIGTRAP, if a program issues a SIGTRAP the debugger will be called. Therefore, programs can set a handler for SIGTRAP and issue the signal. In normal situations, the application will catch the SIGTRAP and nothing happens. When under a debugger, the debugger will get the SIGTRAP. The application can detect that it was unable to catch the SIGTRAP, and even if the debugger lets the signal pass through to the handler, the timing will be way off.

The example code would be:

int trap = 1;

void handler(int a) {
    print("It's not a trap!\n");
    trap = 0;
}


int detect_debugger() {
    signal(SIGTRAP, handler); // Sets the handler
    raise(SIGTRAP); // Raise the TRAP
    return trap;
}

This approach uses a flag, but if instead of a boolean, the variable stores a timestamp, the code can also compare the time (check man 2 clock_gettime) it takes for the signal to be handled. Usually it will be very fast, except if there is a debugger present.

In all cases, handling such code will require patching the binary to either avoid the execution of the detection routine, or ignore their result. This can be done by editing the file and changing the opcode (e.g., turn JZ into a JNZ, or replacing the call with NOP), or dynamically with frameworks such as Qiling.

TASKS:

Implement the examples that aim to detect a debugger and check their effectiveness
Change the last example to take in consideration the time it takes for the signal to be handled. Compare the results with and without a debugger.
Implement methods to avoid the detection in your programs (patching or Qiling)

Exercise 7

Consider that programs can actually be a simple loader for another program, which is a second stage. The second payload may be received from the network and executed without touching the disk. This would leave little forensic traces.

The following example consists of a builder, a stub and a second stage (a simple Hello).

#include <stdio.h>

int main(int argc, char *argv[]) {
    printf("==================================\n");
    printf("[+] Hello from the second payload!\n");
    printf("==================================\n");
    return 0;
}

You can compile it with:

gcc -o hello hello.c

The builder takes the second stage and creates an header file with its encrypted content.

import os
import subprocess

# --- Configuration ---
TARGET_BIN = "hello"          # The compiled binary we want to pack
OUTPUT_BIN = "packed_hello"   # The final output executable
XOR_KEY = 0xAA                # Simple encryption key
HEADER_FILE = "payload.h"     # The generated header containing the payload
STUB_SOURCE = "stub.c"        # The C file generated by this script

def main():
    if not os.path.exists(TARGET_BIN):
        print(f"Error: Target binary '{TARGET_BIN}' not found.")
        return

    print(f"Builder: Reading target binary '{TARGET_BIN}'...")
    with open(TARGET_BIN, "rb") as f:
        raw_data = f.read()

    # Encrypt the binary data
    print(f"Builder: Encrypting payload (Size: {len(raw_data)} bytes)...")
    encrypted_data = bytearray(b ^ XOR_KEY for b in raw_data)

    # Serialize into a C-style array string
    print(f"Builder: Generating '{HEADER_FILE}'...")
    hex_array = ", ".join([f"0x{b:02x}" for b in encrypted_data])
    payload_size = len(encrypted_data)

    # Create the Header File (payload.h)
    h_code = f"""#ifndef PAYLOAD_H
#define PAYLOAD_H

#define XOR_KEY {XOR_KEY}
#define PAYLOAD_SIZE {payload_size}

// The encrypted ELF binary, ready to be included in the stub.
unsigned char payload[] = {{ {hex_array} }};

#endif // PAYLOAD_H
"""
    with open(HEADER_FILE, "w") as f:
        f.write(h_code)

    print(f"Builder: Compile the final packed binary with gcc -o main stub.c")

if __name__ == "__main__":
    main()

Then we get the stub. It will include the second stage, and load it into memory. This version has a small variation as the interesting code is executed after the main function.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <sys/stat.h>
#include "payload.h" // <-- Includes our encrypted binary and macros

int main(int argc, char *argv[]) {
    printf("Hello world. I'm harmless\n");
    return 0;
}

// This function is registered as a destructor.
// It will be executed AFTER main
void __attribute__ ((destructor)) loader(void){
    // Decrypt the payload in memory (payload array is pulled from payload.h)
    // We are using an internal variable, but the payload could be provided
    // through the network, or as a parasite in a polyglot
    for (long i = 0; i < PAYLOAD_SIZE; i++) {
        payload[i] ^= XOR_KEY;
    }

    // Create an anonymous, RAM-only file descriptor.
    int mem_fd = memfd_create("unpacked_binary", MFD_CLOEXEC);
    if (mem_fd == -1) {
        return;
    }

    // Write the decrypted ELF binary into the RAM file
    write(mem_fd, payload, PAYLOAD_SIZE);

    // Execute the RAM file
    extern char **environ;
    char *argv[] = { "kworker/u4:2", NULL }; // Fake name
    fexecve(mem_fd, argv, environ);

    // If fexecve succeeds, this code is never reached.
    return ;
}

Compile it (gcc -o stub stub.c) and check the disassembled result. Take attention to the name of the functions and the result after strip.

Then execute it and see that it will load the second stage with success.

The objective of this snippets is to demonstrate how the payload can be obtained from the binary using Qiling.

We will accomplish this by detecting execution at a address that was previously written to. In standard situations this should not happen. We use it to obtain the second stage after decryption.

Consider the following dumper.

import sys
from qiling import Qiling
from qiling.const import QL_VERBOSE

def hook_write(ql, fd, buf, count, *args):
    """
    This function intercepts every 'write' syscall made by the emulated binary.
    """
    if fd > 2:
        # Read the first 4 bytes from the memory address (buf)
        magic = ql.mem.read(buf, 4)
        
        # Check if the data being written is an ELF binary (\x7fELF)
        if magic == b'\x7fELF':
            print(f"\nALERT: Binary attempting to write an ELF file to FD {fd}!")
            
            # Read the full decompressed payload from the emulated memory
            payload_data = ql.mem.read(buf, count)
            
            # Write it to our host machine's disk
            dump_filename = "dumped_payload.elf"
            with open(dump_filename, "wb") as f:
                f.write(payload_data)
                
            print(f"Success! Payload safely dumped to: {dump_filename}")
            
            # Stop the emulator instantly. 
            # This prevents the malware from reaching fexecve and executing.
            print("[*] Halting emulation to prevent payload execution.")
            ql.emu_stop()

def main():
    if len(sys.argv) < 2:
        print(f"Usage: python3 {sys.argv[0]} <packed_binary>")
        sys.exit(1)

    target_binary = sys.argv[1]

    try:
        # We set rootfs to "/" assuming you are running this on a Linux host.
        ql = Qiling([target_binary], rootfs="/", verbose=QL_VERBOSE.DISABLED)
        
        # Register our syscall hook. 
        # Whenever the binary calls 'write', route it to 'hook_write' first.
        ql.os.set_syscall('write', hook_write)
        ql.run()
    except Exception as e:
        print(f"[-] Emulation error: {e}")
if __name__ == "__main__":
    main()

TASKS:

Compile the example and analyse the result with static analysis.
Compare the standar binary and a stripped binary
Run the dumper and analyse the extracted file using the same methods.

Exercise 8

We can call malicious code using signal handlers? It means we can jump using signal handlers. One interesting case is using SIGFPE as triggering a signal can be achieved with x/0 anywhere in code.

Implement a small program (an hello world) that includes a signal handler for SIGFPE and then triggers this along the code.

You can even resume operation after the signal, by manipulating the RIP register, or execute different functions based on the values of the registers before the SIGFPE.

The structure is something like:

void sigfpe_handler(int signum,  siginfo_t *si, void* arg) {
    //Malicious code
    
    //Continue Execution
    ((ucontext_t *)arg)->uc_mcontext.gregs[REG_RIP] = ((ucontext_t *)arg)->uc_mcontext.gregs[REG_RIP] + 0x02;
}


int main(int argc, char** argv) {

    // Install handler somewhere in the code
    int op = 3;
    int div = 1;
    struct sigaction act;
    struct sigaction oldact;
    memset(&act, 0, sizeof(act));
    act.sa_handler = sigfpe_handler;
    act.sa_flags = SA_NODEFER | SA_NOMASK | SA_SIGINFO;
    sigaction(SIGFPE, &act, &oldact);
    
    int r = op / div; // Does nothing

    //Manipulate op or div to generate a SIGFPE
    op = 5
    div -= 1;

    //Trigger malicious code because div=0
    r = op/div;

}

The interesting view over this is that the handler can be called anytime there is a SIGFPE, which very subtle. Triggering signals also colides with gdb usage

TASKS:

Implement a small program with the this concept
Disassemble and decompile the binary and check the result

Tools

ghidra: https://ghidra-sre.org/
Qiling: https://github.com/qilingframework/qiling
bvi: http://bvi.sourceforge.net/
Hex Workshop: http://www.hexworkshop.com/
file: https://man7.org/linux/man-pages/man1/file.1.html
unzip: https://linux.die.net/man/1/unzip