# Obfuscation & Polyglots

## Lecture Notes

Analyzing some obfuscation, anti-debugging and simple evasion techniques.

The PDF polyglot can be found here

### Exercise 1

The use of Polyglots presents a valid approach to deceive application firewalls, threat detection software, and humans. It relies on crafting binary files, and can make it so that a wrong decoder is used to analyze the file, while an intended payload is also present and is sent to a sensitive domain. In contrast to the use of steganography, where a specific decoder is required in order to extract the content, polyglots are more versatile as the parser may be already present in the destination.

A famous example is the POC||GTFO journal, where PDF files will contain many other contents and can mostly be uncompressed with unzip. This journal exploits PDF as it is an extremely flexible format, allowing the addition of content with great ease.

Other formats also have some caveats, allowing the combination of files with little difficulty. In our example we have a picture and a small crafted ELF file. The file was crafted by hand in order to keep the size small (179 bytes) and will only execute on a X86 64 host.

Several variations of these files were created and can be found here. In all situations, both files are present, with some abuses of the JPG and ZIP formats.

When running file, this is the result:

net1.jpg: JPEG image data, progressive, precision 8, 1536x2048, components 3
net2.jpg: JPEG image data, progressive, precision 8, 1536x2048, components 3
net1.zip: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, no section header
net2.zip: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, no section header


The JPG files have the file somewhere included as well as the ZIP. They will require some extractor. But the ZIP files will have dual behavior.

Can you determine how the files were combined?

Use Mitra https://github.com/corkami/mitra and create some polyglots with your files. Check the table for possible combinations.

### Exercise 2

You can craft your own polyglot, especially with ZIP and/or PDF. Because ZIP is a format ready to include several other files, and PDF is highly flexible, it is easy to build a polyglot with bloth.

If you consider the PDF Objects as all text before the xref and the PDF trailer as all text after the xref, you can store it in a ZIP without compression and get a polyglot.

zip -Z store -z polyglot.pdf pdf_objects < pdf_trailer


Generate a basic PDF and create a ZIP-PDF polyglot. Use a hex editor and file to check the result. Add other files to the ZIP. Will it work?

### Exercise 3

Dead code can be inserted into binary files in order to make the reversing activities more complex. This is a simple technique which may introduce a little entropy to the analysis process, as additional code is present, but it is never actually executed.

Consider the following file that calculates the factorial of a number:

#include <stdio.h>
#include <stdlib.h>

unsigned long long factorial(unsigned long long a) {

unsigned long long r = 1;

while(a > 0 ){
unsigned long long v = r * a;
if(v < r){
printf("ERROR: Overflow\n");
exit(-1);
}
r = v;
a = a - 1;
}
return r;
}

int main(int argc, char** argv) {
unsigned long long v = 0;
if(argc != 2) {
printf("Need a positive integer argument\n");
return -1;
}
v = atol(argv[1]);

if(v <= 0){
printf("Need a positive integer argument\n");
return -1;
}

printf("Result: %llu\n", factorial(v));

return 0;
}


The file can be compiled with:

gcc -O0 -o factorial factorial.c


Notice the use of -O0 that disables compiler optimizations. If they are enabled, dead code will be removed.

Now consider adding code that is never executed, by adding dead logic along it. In this example, the instructions are never called as we introduce a jmp to a label after them.

//...
asm("jmp label");
factorial(factorial(argc));
asm("label:");
//...


Add Dead code to our example, compile it and check the result with ghidra.

### Exercise 4

The previous example was rather simple and fragile. Any compiler optimization would remove the dead code and the Function Graphs would rapidly show that the code is not executed.

Lets consider adding conditional statements that are known by you (an opaque predicate), but still need to be evaluated. The objective is that dead code could be potentially reached from a static perspective, but in reality it is never reached, and you know the result of the expression.

One example is using the argc variable. As an int, argc can have negative or positive values, but in practice argc will always be equal or larger than 1. Other variables can be used to exploit this mechanism.

An example would be:

if(argc < 0 ) {
}


Implement a variation of the previous example using opaque predicates. The dead code should be as complex as possible, but it still needs to by syntactically valid. Decompile the resulting files and analyze the result.

### Exercise 5

The use of functions still provides some insight to reversers in regard to the structure of the program. Yet, compilers allow flattening the code, inlining all functions used. This will effectively create larger, potentially sub-optimal code, but with added confusion.

For GCC this can be used by setting the  __attribute__((always_inline)) attribute in the functions we want to inline. For the factorial function, it would become unsigned long long __attribute__((always_inline)).

Add this to the result of the previous exercise and compare the result. If works better if you have multiple functions to insert as dead code, and you combine them.

### Exercise 6

Until now our dead code makes some sense. At least it is syntactically valid C code, and is always composed by valid CPU instructions. What if it isn’t? What if the code added makes no sense, and it contains invalid opcodes?

There are several ways of achieving this. The concept is to combine opaque predicates with invalid opcodes, created with random bytes. Preprocessor tricks can be used to achieve this effect, but it will be complicated in the scope of a lecture demonstration. For a practical example, check this.

A simple way of doing it is to introduce placeholders and then fill those placeholders with junk. The placeholders can be of any instruction. In this case we will use NOP.

Consider the following macros, which will generate chunks of NOP instructions to be added.

#define REP0(X)
#define REP1(X) X
#define REP2(X) REP1(X) X
#define REP3(X) REP2(X) X
#define REP4(X) REP3(X) X
#define REP5(X) REP4(X) X
#define REP6(X) REP5(X) X
#define REP7(X) REP6(X) X
#define REP8(X) REP7(X) X
#define REP9(X) REP8(X) X
#define REP10(X) REP9(X) X

#define NOP "nop;"
#define REP(HUNDREDS,TENS,ONES) \
REP##HUNDREDS(REP10(REP10(NOP))) \
REP##TENS(REP10(NOP)) \
REP##ONES(NOP)


Then in your code you can add placeholders like this, using inline assembly. It will insert 323 NOP instruction into the code.

    __asm__ REP(3,2,3)


As NOP instructions are not frequent, and 323 NOP are even less frequent or practically impossible to occur, you can post process the file to replace those instructions with garbage.

Implement this strategy into the file by adding placeholders in multiple places. Create a simple script that processes the file, replacing and large sequence of NOP (0x90) with random bytes. Check the program executes correctly and analyze the result with ghidra.

### Exercise 7

Another way of hiding code is to encrypt it, either with strong cryptographic methods, or with a simple XOR. Using a XOR is frequently a good approach as the decrypting code will be small, which will result in a smaller fingerprint to scanners.

The process consists in compiling the binary, and encrypting the code to be hidden. If we consider the factorial function as a target to hide, after we compile the program we obtain:

\$ objdump -M intel -d factorial

...
0000000000001169 <factorial>:
1169:       55                      push   rbp
116a:       48 89 e5                mov    rbp,rsp
116d:       48 83 ec 20             sub    rsp,0x20
1171:       48 89 7d e8             mov    QWORD PTR [rbp-0x18],rdi
1175:       48 c7 45 f8 01 00 00    mov    QWORD PTR [rbp-0x8],0x1
117c:       00
117d:       eb 3d                   jmp    11bc <factorial+0x53>
117f:       48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
1183:       48 0f af 45 e8          imul   rax,QWORD PTR [rbp-0x18]
1188:       48 89 45 f0             mov    QWORD PTR [rbp-0x10],rax
118c:       48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
1190:       48 3b 45 f8             cmp    rax,QWORD PTR [rbp-0x8]
1194:       73 19                   jae    11af <factorial+0x46>
1196:       48 8d 05 6b 0e 00 00    lea    rax,[rip+0xe6b]        # 2008 <_IO_stdin_used+0x8>
119d:       48 89 c7                mov    rdi,rax
11a0:       e8 8b fe ff ff          call   1030 <puts@plt>
11a5:       bf ff ff ff ff          mov    edi,0xffffffff
11aa:       e8 b1 fe ff ff          call   1060 <exit@plt>
11af:       48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
11b3:       48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
11b7:       48 83 6d e8 01          sub    QWORD PTR [rbp-0x18],0x1
11bc:       48 83 7d e8 00          cmp    QWORD PTR [rbp-0x18],0x0
11c1:       75 bc                   jne    117f <factorial+0x16>
11c3:       48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
11c7:       c9                      leave
11c8:       c3                      ret

00000000000011c9 <main>:
...


After the file is compiled, you can use a post-processing script or even an hex editor to XOR the file between 0x1169 and 0x11c8.

At the start of the main function, or at any place before the function is used, the following code will decrypt the memory in real time.

for(void* i = factorial; (int) i <= (int) main; i++) {
*((int*) i) = *((int*) i) ^ 0xAA;
}


Implement the example and evaluate the end result using static analysis. In order to obtain the actual code when reversing, we need to dump the memory after it is decrypted, or decrypt the file before analysis.

Using Qiling, implement a sandbox that dumps the process memory to a file, which can be loaded into ghidra for analysis.

### Exercise 8

A parallel aspect is anti-debugging, where programs will try to detect they are being debugged and change their behaviour dynamically. This is possible because virtual machines, debuggers, emulation environments will have some side effects to the dynamic characteristics of a file.

PTRACE based debuggers, and most other debuggers can be detected. If an application detects the breakpoint, it can change the execution flow to obfuscate its real purpose. As an example, an application with a trojan malware may be interested in detecting the debugger and if found, it will not activate the malicious code. An analyst will struggle to analyze the real purpose of the application, unless the detection code is removed, avoided, or it’s result is ignored. The correct strategy will vary with how the anti-debugging techniques are actually added.

Virtual Machines can also be detected as the hardware devices identifiers, the CPU, or the drivers can provide hints that the program is not running on a bare host. Also, the actual opcodes supported by an hypervisor can differ from a real host. For a interesting overview of some techniques, check Peter Ferrie work titled Attacks on Virtual Machine Emulators.

When considering debuggers, a simple method involves an application tracing itself with PTRACE_TRACEME. While this will work, it will have consequences if real debugging is required.

if (ptrace(PTRACE_TRACEME, 0, 1, 0) == -1)  {
printf("Debugger detected\n");
return 1;
}


The application can also fork a child, and let the child use PTRACE_ATTACH to the parent, then detaching with PTRACE_DETACH.

int pid = fork();

if (pid == 0) { // Child
int ppid = getppid(); // Get parent PID
if (ptrace(PTRACE_ATTACH, ppid, NULL, NULL) == 0) {
waitpid(ppid, NULL, 0);   // Wait for parent
ptrace(PTRACE_CONT, NULL, NULL); // Continue parent
ptrace(PTRACE_DETACH, getppid(), NULL, NULL); // Detach
} else {
printf("DEBUGGER detected")
}
} else { // Parent. Exit...}


Another method involves inspecting the content of /proc/self/status, looking for the value of TracerPid:.

As the debuggers use SIGTRAP, if a program issues a SIGTRAP the debugger will be called. Therefore, programs can set a handler for SIGTRAP and issue the signal. In normal situations, the application will catch the SIGTRAP and nothing happens. When under a debugger, the debugger will get the SIGTRAP. The application can detect that it was unable to catch the SIGTRAP, and even if the debugger lets the signal pass through to the handler, the timing will be way off.

The example code would be:

int trap = 1;

void handler(int a) {
print("It's not a trap!\n");
trap = 0;
}

int detect_debugger() {
signal(SIGTRAP, handler);  // Sets the handler
raise(SIGTRAP); // Raise the TRAP
return trap;
}


This approach uses a flag, but if instead of a boolean, the variable stores a timestamp, the code can also compare the time (check man 2 clock_gettime) it takes for the signal to be handled. Usually it will be very fast, except if there is a debugger present.

In all cases, handling such code will require patching the binary to either avoid the execution of the detection routine, or ignore their result. This can be done by editing the file and changing the opcode (e.g., turn JZ into a JNZ, or replacing the call with NOP), or dynamically with frameworks such as Qiling.