Binary Executable Files
Lecture Notes
Analyzing Binary Executable files such as ELF. Focus on structure and Symbol linking.
Download here
Practical Tasks
Exercise 1
Compile a small C program with 1-2 function, and using objdump, analyze the output of object created as well as the final binary.
As a possible C program you may consider:
#include <stdio.h>
#include <fcntl.h>
void bar(void) {
printf("Hello");
}
void foo(void){
printf("World");
int fd = open("a.txt", O_RDONLY);
close(fd);
}
int main(int argc, char **argv){
foo();
return 0;
}
After the program is compiled (gcc -o prog prog.c
) you can use objdump to inspect it.
To obtain the sections you can use objdump -h prog
and the result should be something like:
objdump -h prog
prog: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 00000000000002a8 00000000000002a8 000002a8 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 00000000000002c4 00000000000002c4 000002c4 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.gnu.build-id 00000024 00000000000002e4 00000000000002e4 000002e4 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .gnu.hash 00000024 0000000000000308 0000000000000308 00000308 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynsym 000000d8 0000000000000330 0000000000000330 00000330 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 0000008f 0000000000000408 0000000000000408 00000408 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.version 00000012 0000000000000498 0000000000000498 00000498 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version_r 00000020 00000000000004b0 00000000000004b0 000004b0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rela.dyn 000000c0 00000000000004d0 00000000000004d0 000004d0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.plt 00000048 0000000000000590 0000000000000590 00000590 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .init 00000017 0000000000001000 0000000000001000 00001000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .plt 00000040 0000000000001020 0000000000001020 00001020 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt.got 00000008 0000000000001060 0000000000001060 00001060 2**3
...
You can also obtain the symbols present in the binary using objdump -tT prog
, which will show the symbol table
and the dynamic symbol table
.
prog: file format elf64-x86-64
SYMBOL TABLE:
00000000000002a8 l d .interp 0000000000000000 .interp
00000000000002c4 l d .note.ABI-tag 0000000000000000 .note.ABI-tag
00000000000002e4 l d .note.gnu.build-id 0000000000000000 .note.gnu.build-id
0000000000000308 l d .gnu.hash 0000000000000000 .gnu.hash
0000000000000330 l d .dynsym 0000000000000000 .dynsym
0000000000000408 l d .dynstr 0000000000000000 .dynstr
0000000000000498 l d .gnu.version 0000000000000000 .gnu.version
00000000000004b0 l d .gnu.version_r 0000000000000000 .gnu.version_r
00000000000004d0 l d .rela.dyn 0000000000000000 .rela.dyn
0000000000000590 l d .rela.plt 0000000000000000 .rela.plt
...
The full contents of this tool are omitted. Run it and determine:
- How many sections are present?
- How many symbols are present?
Strip the binary and repeat the same process with objdump. Then compare both results. In particular, answer:
- What happened to symbols?
- What happened to function names, and functions?
- What happened to the file size?
Exercise 2
Compile a small C program with 1-2 functions and external libraries. As an example, you can consider a program that creates a thread (libpthread
) or compresses a file (libz
).
Any other function is adequate, as long as they are from external libraries.
One example is present at the zlib
repository: https://raw.githubusercontent.com/madler/zlib/master/test/example.c
You can compile this code with gcc -o example example.c -lz
.
Using a Hex editor, identify the magic values of an ELF
, and the values of its header.
You can use readelf to guide you by presenting the values that you can find in the hex editor.
$ readelf -h example
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1270
Start of program headers: 64 (bytes into file)
Start of section headers: 20368 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
Using readelf, process the file, and identify the main sections (readelf -S
) and its content. The following snippet show the content for section 25, which is the .data
section.
It will contain global tables and global variables.
$ readelf -x 25 example
Hex dump of section '.data':
0x00005130 00000000 00000000 38510000 00000000 ........8Q......
0x00005140 68656c6c 6f2c2068 656c6c6f 21000000 hello, hello!...
0x00005150 5a310000 00000000 Z1......
Inspect the .plt
jump table and the .got
offset table. You can actually disassemble the .plt
section with objdump -M intel -d
.
The output will show that for each symbol, there is some code to resolve the function.
The first entry will be related to the generic code for relocation, while the next entries will contain code specific for each symbol.
Disassembly of section .plt:
0000000000001020 <gzclose@plt-0x10>:
1020: ff 35 e2 3f 00 00 push QWORD PTR [rip+0x3fe2] # 5008 <_GLOBAL_OFFSET_TABLE_+0x8>
1026: ff 25 e4 3f 00 00 jmp QWORD PTR [rip+0x3fe4] # 5010 <_GLOBAL_OFFSET_TABLE_+0x10>
102c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
0000000000001030 <gzclose@plt>:
1030: ff 25 e2 3f 00 00 jmp QWORD PTR [rip+0x3fe2] # 5018 <gzclose@Base>
1036: 68 00 00 00 00 push 0x0
103b: e9 e0 ff ff ff jmp 1020 <_init+0x20>
0000000000001040 <free@plt>:
1040: ff 25 da 3f 00 00 jmp QWORD PTR [rip+0x3fda] # 5020 <free@GLIBC_2.2.5>
1046: 68 01 00 00 00 push 0x1
104b: e9 d0 ff ff ff jmp 1020 <_init+0x20>
In this case, the .got
will be at rip+0x3fe2
. If the actual value of the function is found, the instruction pointer will jump to that address. Otherwise, it jumps back to the .plt
, a value is pushed to the stack (an index), an then the generic resolver is called.
Create a diagram (drawing) of the binary file and represent its structure from the perspective of the ELF structure, a segment view, and a section view. It is important to understand which parts of the ELF file are actually loaded into segments, and where they will be placed in the memory. The structure is important to analyze to see how the bytes map to segments and sections.
Exercise 3
The LIEF library allows extensive manipulation of binary files, including ELF
objects. Using LIEF, make a small python script that prints information about an ELF, that may be relevant for future reverse engineering tasks.
In particular:
- The type of file and architecture
- The list of libraries loaded
- The compiler used
- The list of symbols from external libraries
- The address of the program entry point
- Information whether the program is using
RELRO
,PIE
, andCanaries
- Information whether the program is stripped
Exercise 4
An important feature of dynamic analysis is the interception, redirection, and even modification of symbols. This can be easily achieved using the LD_PRELOAD
flag for the dynamic linker.
The following snippet allows us to override any function with a custom implementation, and call the original function (or just forbid its execution). In this situation we will use LD_PRELOAD=libover.so prog
, where libover
will contain this code, while the prog
is a standard program under analysis.
void (*original_foo)(void) = NULL;
void foo() { // Function to override
if (original_foo == NULL) { // First time execution: load the real address
original_foo = dlsym(RTLD_NEXT, "foo");
}
printf("foo entry\n");
original_foo(); // call original function.
printf("foo exit\n");
}
To compile it use: gcc -o libover.so -shared -fPIC libover.c -dl
.
Taking this as an example, write a library to intercept communications with secure sockets, printing the contents before they are encrypted.
Test the library with an application such as wget
. For wget
, you can dump the list of dynamic symbols using objdump to look for potential symbols to override.
$ objdump -T /usr/bin/wget |grep gnutls
...
0000000000000000 DF *UND* 0000000000000000 (GNUTLS_3_4) gnutls_record_recv
...
0000000000000000 DF *UND* 0000000000000000 (GNUTLS_3_4) gnutls_certificate_verify_peers2
...
0000000000000000 DF *UND* 0000000000000000 (GNUTLS_3_4) gnutls_record_send
Three symbols are interesting as they may allow to bypass certificate validation, inspect data sent or data received.
Tools and links
- objdump: https://man7.org/linux/man-pages/man1/objdump.1.html
- readelf: https://man7.org/linux/man-pages/man1/readelf.1.html
- LIEF: https://lief-project.github.io/
- gnutls:https://gnutls.org/documentation.html
- HxD: https://mh-nexus.de/en/hxd/
- bvi: http://bvi.sourceforge.net/
- ImHex: https://github.com/WerWolv/ImHex
- HexWorkshop: http://www.hexworkshop.com/
- ghex: https://wiki.gnome.org/Apps/Ghex
- HexEdit: https://hexed.it/
- FileInsight: https://github.com/nmantani/FileInsight-plugins