Obfuscation

Lecture Notes

Analyzing some obfuscation, anti-debugging and simple evasion techniques.

Download here

Practical Tasks

Exercise 1

Dead code can be inserted into binary files in order to make the reversing activities more complex. This is a simple technique which may introduce a little entropy to the analysis process, as additional code is present, but it is never actually executed.

Consider the following file that calculates the factorial of a number:

#include <stdio.h>
#include <stdlib.h>

unsigned long long factorial(unsigned long long a) {
    
    unsigned long long r = 1
-
    
    while(a > 0 ){
        unsigned long long v = r * a
-
        if(v < r){
            printf("ERROR: Overflow\n")
-
            exit(-1)
-
        }
        r = v
-
        a = a - 1
-
    }
    return r
-
}

int main(int argc, char** argv) {
    unsigned long long v = 0
-
    if(argc != 2) {
        printf("Need a positive integer argument\n")
-
        return -1
-
    }
    v = atol(argv[1])
-
    
    if(v <= 0){
        printf("Need a positive integer argument\n")
-
        return -1
-
    }
    
    printf("Result: %llu\n", factorial(v))
-

    return 0
-
}

The file can be compiled with:

gcc -O0 -o factorial factorial.c

Notice the use of -O0 that disables compiler optimizations. If they are enabled, dead code will be removed.

Now consider adding code that is never executed, by adding dead logic along it. In this example, the instructions are never called as we introduce a jmp to a label after them.

//...
   asm("jmp label")
-
   factorial(factorial(argc))
-
   asm("label:")
-
//...

Add Dead code to our example, compile it and check the result with ghidra.

Exercise 2

The previous example was rather simple and fragile. Any compiler optimization would remove the dead code and the Function Graphs would rapidly show that the code is not executed.

Lets consider adding conditional statements that are known by you (an opaque predicate), but still need to be evaluated. The objective is that dead code could be potentially reached from a static perspective, but in reality it is never reached, and you know the result of the expression.

One example is using the argc variable. As an int, argc can have negative or positive values, but in practice argc will always be equal or larger than 1. Other variables can be used to exploit this mechanism.

An example would be:

if(argc < 0 ) {
    //Dead code here
}

TASKS:

  • Implement a variation of the previous example using opaque predicates. The dead code should be as complex as possible, but it still needs to by syntactically valid.
  • Decompile the resulting files and analyze the result.

Exercise 3

The use of functions still provides some insight to reversers in regard to the structure of the program. Yet, compilers allow flattening the code, inlining all functions used. This will effectively create larger, potentially sub-optimal code, but with added confusion.

For GCC this can be used by setting the __attribute__((always_inline)) attribute in the functions we want to inline. For the factorial function, it would become unsigned long long __attribute__((always_inline)).

TASKS:

  • Add this to the result of the previous exercise and compare the result. If works better if you have multiple functions to insert as dead code, and you combine them.

Exercise 4

Until now our dead code makes some sense. At least it is syntactically valid C code, and is always composed by valid CPU instructions. What if it isn’t? What if the code added makes no sense, and it contains invalid opcodes?

There are several ways of achieving this. The concept is to combine opaque predicates with invalid opcodes, created with random bytes. Preprocessor tricks can be used to achieve this effect, but it will be complicated in the scope of a lecture demonstration. For a practical example, check this.

A simple way of doing it is to introduce placeholders and then fill those placeholders with junk. The placeholders can be of any instruction. In this case we will use NOP.

Consider the following macros, which will generate chunks of NOP instructions to be added.

#define REP0(X)
#define REP1(X) X
#define REP2(X) REP1(X) X
#define REP3(X) REP2(X) X
#define REP4(X) REP3(X) X
#define REP5(X) REP4(X) X
#define REP6(X) REP5(X) X
#define REP7(X) REP6(X) X
#define REP8(X) REP7(X) X
#define REP9(X) REP8(X) X
#define REP10(X) REP9(X) X

#define NOP "nop
-"
#define REP(HUNDREDS,TENS,ONES) \
  REP##HUNDREDS(REP10(REP10(NOP))) \
  REP##TENS(REP10(NOP)) \
  REP##ONES(NOP)

Then in your code you can add placeholders like this, using inline assembly. It will insert 323 NOP instruction into the code.

    __asm__ REP(3,2,3)

As NOP instructions are not frequent, and 323 NOP are even less frequent or practically impossible to occur, you can post process the file to replace those instructions with garbage.

TASKS:

  • Implement this strategy into the file by adding placeholders in multiple places.
  • Create a simple script that processes the file, replacing and large sequence of NOP (0x90) with random bytes.
  • Check the program executes correctly and analyze the result with ghidra.

Exercise 5

Another way of hiding code is to encrypt it, either with strong cryptographic methods, or with a simple XOR. Using a XOR is frequently a good approach as the decrypting code will be small, which will result in a smaller fingerprint to scanners.

The process consists in compiling the binary, and encrypting the code to be hidden. If we consider the factorial function as a target to hide, after we compile the program we obtain:

$ objdump -M intel -d factorial

...
0000000000001169 <factorial>:
    1169:       55                      push   rbp
    116a:       48 89 e5                mov    rbp,rsp
    116d:       48 83 ec 20             sub    rsp,0x20
    1171:       48 89 7d e8             mov    QWORD PTR [rbp-0x18],rdi
    1175:       48 c7 45 f8 01 00 00    mov    QWORD PTR [rbp-0x8],0x1
    117c:       00
    117d:       eb 3d                   jmp    11bc <factorial+0x53>
    117f:       48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
    1183:       48 0f af 45 e8          imul   rax,QWORD PTR [rbp-0x18]
    1188:       48 89 45 f0             mov    QWORD PTR [rbp-0x10],rax
    118c:       48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
    1190:       48 3b 45 f8             cmp    rax,QWORD PTR [rbp-0x8]
    1194:       73 19                   jae    11af <factorial+0x46>
    1196:       48 8d 05 6b 0e 00 00    lea    rax,[rip+0xe6b]        # 2008 <_IO_stdin_used+0x8>
    119d:       48 89 c7                mov    rdi,rax
    11a0:       e8 8b fe ff ff          call   1030 <puts@plt>
    11a5:       bf ff ff ff ff          mov    edi,0xffffffff
    11aa:       e8 b1 fe ff ff          call   1060 <exit@plt>
    11af:       48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
    11b3:       48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
    11b7:       48 83 6d e8 01          sub    QWORD PTR [rbp-0x18],0x1
    11bc:       48 83 7d e8 00          cmp    QWORD PTR [rbp-0x18],0x0
    11c1:       75 bc                   jne    117f <factorial+0x16>
    11c3:       48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
    11c7:       c9                      leave
    11c8:       c3                      ret

00000000000011c9 <main>:
...

After the file is compiled, you can use a post-processing script or even an hex editor to XOR the file between 0x1169 and 0x11c8.

At the start of the main function, or at any place before the function is used, the following code will decrypt the memory in real time.

for(void* i = factorial
- (int) i <= (int) main
- i++) {
    *((int*) i) = *((int*) i) ^ 0xAA
-
}

TASKS:

  • Implement the example and evaluate the end result using static analysis. In order to obtain the actual code when reversing, we need to dump the memory after it is decrypted, or decrypt the file before analysis.
  • Using Qiling, implement a sandbox that dumps the process memory to a file, which can be loaded into ghidra for analysis.

Exercise 6

A parallel aspect is anti-debugging, where programs will try to detect they are being debugged and change their behaviour dynamically. This is possible because virtual machines, debuggers, emulation environments will have some side effects to the dynamic characteristics of a file.

PTRACE based debuggers, and most other debuggers can be detected. If an application detects the breakpoint, it can change the execution flow to obfuscate its real purpose. As an example, an application with a trojan malware may be interested in detecting the debugger and if found, it will not activate the malicious code. An analyst will struggle to analyze the real purpose of the application, unless the detection code is removed, avoided, or it’s result is ignored. The correct strategy will vary with how the anti-debugging techniques are actually added.

Virtual Machines can also be detected as the hardware devices identifiers, the CPU, or the drivers can provide hints that the program is not running on a bare host. Also, the actual opcodes supported by an hypervisor can differ from a real host. For a interesting overview of some techniques, check Peter Ferrie work titled Attacks on Virtual Machine Emulators.

When considering debuggers, a simple method involves an application tracing itself with PTRACE_TRACEME. While this will work, it will have consequences if real debugging is required.

if (ptrace(PTRACE_TRACEME, 0, 1, 0) == -1)  {
    printf("Debugger detected\n")
-
    return 1
-
}

The application can also fork a child, and let the child use PTRACE_ATTACH to the parent, then detaching with PTRACE_DETACH.

int pid = fork()
-

if (pid == 0) { // Child
    int ppid = getppid()
- // Get parent PID
    if (ptrace(PTRACE_ATTACH, ppid, NULL, NULL) == 0) {
      waitpid(ppid, NULL, 0)
-   // Wait for parent
      ptrace(PTRACE_CONT, NULL, NULL)
- // Continue parent
      ptrace(PTRACE_DETACH, getppid(), NULL, NULL)
- // Detach
    } else {
      printf("DEBUGGER detected")
    }
} else { // Parent. Exit...}

Another method involves inspecting the content of /proc/self/status, looking for the value of TracerPid:.

As the debuggers use SIGTRAP, if a program issues a SIGTRAP the debugger will be called. Therefore, programs can set a handler for SIGTRAP and issue the signal. In normal situations, the application will catch the SIGTRAP and nothing happens. When under a debugger, the debugger will get the SIGTRAP. The application can detect that it was unable to catch the SIGTRAP, and even if the debugger lets the signal pass through to the handler, the timing will be way off.

The example code would be:

int trap = 1
-

void handler(int a) {
    print("It's not a trap!\n")
-
    trap = 0
-
}


int detect_debugger() {
    signal(SIGTRAP, handler)
-  // Sets the handler
    raise(SIGTRAP)
- // Raise the TRAP
    return trap
-
}

This approach uses a flag, but if instead of a boolean, the variable stores a timestamp, the code can also compare the time (check man 2 clock_gettime) it takes for the signal to be handled. Usually it will be very fast, except if there is a debugger present.

In all cases, handling such code will require patching the binary to either avoid the execution of the detection routine, or ignore their result. This can be done by editing the file and changing the opcode (e.g., turn JZ into a JNZ, or replacing the call with NOP), or dynamically with frameworks such as Qiling.

TASKS:

  • Implement the examples that aim to detect a debugger and check their effectiveness
  • Change the last example to take in consideration the time it takes for the signal to be handled. Compare the results with and without a debugger.
  • Implement methods to avoid the detection in your programs (patching or Qiling)

Exercise 7

Can you call malicious code using signal handlers? One interesting case is using SIGFPE as triggering a signal can be achieved with x/0 anywhere in code. Implement a small program (an hello world) that includes a signal handler for SIGFPE and then triggers this along the code. You can even resume operation after the signal, by manipulating the RIP register, or execute different functions based on the values of the registers before the SIGFPE.

The structure is something like:

void sigfpe_handler(int signum,  siginfo_t *si, void* arg) {
    //Malicious code
    

    //Continue Execution
    ((ucontext_t *)arg)->uc_mcontext.gregs[REG_RIP] = ((ucontext_t *)arg)->uc_mcontext.gregs[REG_RIP] + 0x02
-
}


int main(int argc, char** argv) {

    // Install handler somewhere in the code
    int op = 3
-
    int div = 1
-
    struct sigaction act
-
    struct sigaction oldact
-
    memset(&act, 0, sizeof(act))
-
    act.sa_handler = sigfpe_handler
-
    act.sa_flags = SA_NODEFER | SA_NOMASK
-
    sigaction(SIGFPE, &act, &oldact)
-
    
    int r = op / div
- // Does nothing

    //Manipulate op or div to generate a SIGFPE
    op = 5
    div -= 1
-

    //Trigger malicious code because div=0
    r = op/div
-

}

TASKS:

  • Implement a small program with the this concept
  • Disassemble and decompile the binary and check the result

Tools

Previous
Next