Binary Exploitation - Architecture, Reversing, Exploitation & Shellcode
x86_64 Architecture & Stack
Byte Order & Packing
- Most Linux CTF binaries on x86/x86_64 are little-endian: the least significant byte is stored at the lowest address.
0x40123apacked as 64-bit little-endian becomes3a 12 40 00 00 00 00 00.- This matters for partial overwrites: overflowing byte-by-byte overwrites the low bytes of a saved pointer first.
1 | from pwn import * |
Registers
word= 2 bytes,dword= 4 bytes,qword= 8 bytesRBP: frame/base pointer,RSP: stack pointer,RIP: instruction pointer.- Writing to 32-bit registers (e.g.,
EAX) automatically clears the upper 32 bits of the corresponding 64-bit register (RAX).xor EAX, EAXis equivalent toxor RAX, RAXbut has a shorter encoding.
Memory access sizes:
1 | mov [RAX], bl ; 1 byte |
Calling Convention (SysV ABI)
- Arguments:
RDI,RSI,RDX,RCX,R8,R9. - Return Value:
RAX(withRDXpotentially holding high bits or extra data). - Syscall arguments:
RDI,RSI,RDX,R10,R8,R9; syscall number inRAX. RCXandR11are clobbered bysyscall.- x86_32: Arguments are passed via the
stackin right-to-left order.
1 | man syscall |
REX.W Prefix (0x48)
The REX prefix range is 0x40-0x4F
(0100 WRXB), used to extend x86 instructions to 64-bit:
- W: Set to 1 for 64-bit operations.
- R, X, B: Extension for register addressing (R8-R15).
mov RDI, RSP requires a REX prefix with W=1 →
0100 1000 = 0x48.
Stack Frame Layout
RBP(Register): Points to the bottom of the current functionstackframe. Local variables are accessed relative to it (e.g.,[RBP - 0x10]).saved RBP(Stack Data): A backup of the caller'sRBP, used to restore the previous frame upon return.
Prologue & Epilogue
1 | ; Prologue (Entering Function B) |
1 | High Addr +-------------------------------+ |
Memory Management
Segments
- .text: Executable code.
- .data: Initialized global writable data.
- .rodata: Read-only data (strings, constants).
- .bss: Uninitialized global writable data.
- stack: Local variables and function metadata.
- heap: Dynamically allocated memory via
malloc().
Viewing memory maps:
1 | cat /proc/<pid>/maps |
Von Neumann Architecture
In memory, bytes can represent either code or data; the CPU
distinguishes them solely based on RIP. Bytes pointed to by
RIP are executed as opcodes, while others accessed via
pointers are treated as data.
Security Mitigations
| Name | Description |
|---|---|
| NX | No-eXecute bit. Hardware-level page attribute marking memory pages as non-executable. |
| W^X | Write XOR Execute. Memory is either writable or executable, never both simultaneously. |
| ASLR | Address Space Layout Randomization. Randomizes memory layout at runtime. |
| PIE | Position Independent Executable. Randomizes the binary
base address. |
| canary | Stack protector. Detects buffer overflow by checking a
secret value before returning. |
Checksec Reading
| Mitigation | Exploitation impact |
|---|---|
| No Canary | Saved RIP overwrite is usually direct once the offset
is known. |
| Canary | Need leak, byte brute-force, non-return control flow, or write primitive that skips the canary. |
| NX disabled | Stack/heap shellcode is viable if control can jump to it. |
| NX enabled | Prefer ROP, ret2libc, mprotect, mmap,
JOP/COP, or existing executable regions. |
| No PIE | Binary code addresses are fixed, e.g. win() / gadgets /
PLT have stable absolute addresses. |
| PIE enabled | Need code pointer leak, PIE base recovery, or partial overwrite/brute force. |
| No RELRO | GOT is writable and can be overwritten before/after resolution. |
| Partial RELRO | .got.plt remains writable; lazy binding is still
present. |
| Full RELRO | GOT is read-only after startup; GOT overwrite is blocked. |
| SHSTK | Intel CET shadow stack verifies returns; classic ret
overwrite/ROP may fail. |
| IBT | Intel CET indirect branch tracking requires indirect-call/jump
targets to begin with endbr64. |
checksec hints at the easiest path, but it is not a
proof. A binary with executable stack may still be protected by
SHSTK/IBT, input filters, unstable stack
addresses, or non-return exits.
ELF & Dynamic Linking
- ELF header: architecture, entry point, program headers.
- Program headers: loader view; maps segments into
memory (
LOAD,GNU_STACK,GNU_RELRO). - Section headers: linker/debugger view; useful for
.text,.plt,.got,.bss, symbols. - PLT: stubs in the binary used to call imported functions.
- GOT: table of resolved function/data addresses.
- Lazy binding: first PLT call jumps into the dynamic resolver, which writes the real libc address into GOT.
- RELRO: controls whether relocation/GOT pages become read-only after relocation.
1 | file ./chall |
Address Translation
- VA: virtual address used at runtime.
- RVA / offset inside module:
runtime_addr - module_base. - File offset: byte offset in the ELF file; not always equal to RVA because segments have mapping alignment.
In a non-PIE ELF, .text often loads near
0x400000. In a PIE ELF, disassemblers may show offsets such
as 0x1d08; runtime address is
pie_base + 0x1d08.
1 | piebase |
Binary Analysis & Tools
CLI Tools
1 | file hello # Identify arch, linking, stripped status |
Reverse Engineering Workflow
- Identify:
file,checksec,readelf -h,rabin2 -I. - Triage strings/imports:
strings,rabin2 -zz,rabin2 -i, IDA/Ghidra strings window. - Find control points:
main, parser loop, comparison branches,win,system,/bin/sh,open/read/write,strcmp,memcmp,printf. - Recover input format: header magic, version, command/directive fields, length fields, endian, per-record size.
- Trace data flow: where user bytes land, how size is computed, where pointers are stored, whether data is copied, validated, freed, or printed.
- Convert checks into constraints: compare constants, printable-byte filters, checksums, jump tables, index math, bounds checks.
- Exploit mapping: choose leak/write/control-flow primitive that matches mitigations.
Useful questions while reversing:
- Is the binary stripped? If not, start with symbols. If stripped, start with imports, strings, and cross-references.
- Does the program return normally, call
exit, or loop forever? Return-address hijacking only triggers on a return path. - Does a length check use signed or unsigned comparison? Which width:
byte,word,dword, orqword? - Are there hidden repeat/backdoor paths that re-enter the vulnerable function before canary validation?
- Does a parser use a switch/jump table? Can an out-of-range or special directive reach extra code?
radare2 Quick Reference
1 | rax2 0x28 # Hex/decimal conversion |
Inside r2:
1 | aaa # Full analysis |
Patching Notes
- Patching data is safer than patching code when the target check compares constants or file-format bytes.
- For code patches, inspect instruction length first; replacing a
longer conditional branch with shorter bytes needs padding with
nop. - In non-PIE ELF, runtime VA is often near file mapping base
0x400000; still convert VA to file offset using program headers or tooling instead of guessing.
1 | rasm2 -a x86 -b 64 "nop" |
Pwndbg
1 | pwndbg /path/to/binary # Launch |
| Category | Command |
|---|---|
| Stepping | n (step over), s (step into),
fin (finish), c (continue) |
| Breakpoints | b *main+123, b <symbol> |
| Memory | x/11s <addr>,
hexdump <addr> 44,
tele <addr> |
| Info | context, vmmap,
xinfo <addr>, checksec |
my ~/.gdbinit:
1 | source /usr/share/pwndbg/gdbinit.py |
pwntools
1 | from pwn import * |
View shellcraft:
1 | pwn shellcraft -l |
De Bruijn Sequence (Cyclic)
A De Bruijn sequence \(B(k, n)\) is a cyclic sequence where every possible string of length \(n\) over an alphabet of size \(k\) appears exactly once.
- Why it's used: When a
binarycrashes due to abuffer overflow, theRIPwill be overwritten by a 4 or 8-byte chunk of the sequence. Since every chunk is unique,cyclic_findcan instantly determine the exactoffsetneeded for the exploit.
Crash-offset workflow:
1 | from pwn import * |
For 64-bit patterns, keep n consistent between
generation and lookup if you use non-default subsequence size.
Environment Control
- Disable ASLR for Debugging:
gdbdisablesASLRby default. For a clean environment, usesetarch x86_64 -R /bin/bashto spawn a shell where all child processes (excluding SUID) haveASLRdisabled. - SUID Trap:
gdbcannot disableASLRfor SUID binaries due to kernel security. Remove the SUID bit (chmod u-s) or copy thebinaryto a local directory before debugging. - Global ASLR Toggle:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space(Reset to2when finished).
Environment affects stack addresses. argv[0] length,
environment variables, terminal, and pwntools launching mode can shift
stack buffers. Use env={} in process() or add
a NOP sled when stack shellcode depends on approximate addresses.
Information Disclosure Causes
String Termination Problems
C strings are null-terminated, meaning they lack
length metadata and rely on a 0x00 byte to mark the
end.
- Missing Null byte: If a program reads exactly \(N\) bytes into an \(N\)-byte buffer (e.g., using
read()), it won't append anull byte. - Disclosure: Functions like
printf("%s")will continue reading past the buffer until they hit anull byte, potentially leaking adjacent sensitive data like aflagorcanary.
Uninitialized Data (Stack Frame Reuse)
- Frame Persistence: When a function returns, its
stackframe is not cleared. TheRSPsimply moves back up. - Ghost Data: If a subsequent function call allocates a frame over the same memory area and fails to initialize its variables, it can read or "leak" the "ghost" data left by the previous function.
Compiler Backstabbing (Dead Store Elimination)
- The Trap: A developer might use
memset(buf, 0, size)to clear a sensitiveflagbefore a function returns. - DSE: If the compiler (with
-O2or-O3) determines thatbufis never accessed again before the function returns, it may "optimize out" thememsetentirely, leaving the secret data on thestack. Use-fno-inlineor specific memory barrier techniques to prevent this.
Format String Leaks
printf(user_input) treats user bytes as a format string.
This can leak stack/register values and, with %n, write
memory.
1 | %p %p %p %p # leak pointers |
Pwntools helpers:
1 | from pwn import * |
Common path with writable GOT: leak a libc address, calculate
libc.address, then overwrite
printf@GOT/puts@GOT with system
or an offset into win.
Memory Corruption Primitives
Stack Buffer Overflow
Typical offset calculation:
1 | buf = rbp - 0x60 |
Ret2win payload:
1 | payload = b"A" * offset_to_ret + p64(elf.sym["win"]) |
If win_authed(token) checks a stack/local token,
sometimes jump past the check instead of calling the function entry.
This is an offset jump. Make sure the target
instruction does not depend on skipped setup.
Canary Bypass Patterns
- Direct leak: program prints past the canary because
the input overwrote its leading
\x00terminator. - Residual stack leak: another function leaves a canary copy or secret in reused stack memory.
- Recursive/retry path: trigger a path that re-enters vulnerable code before the outer canary is checked.
- Fork brute-force: child crashes do not randomize parent canary; brute-force byte-by-byte.
- Skip canary write zone: corrupt loop index, pointer, length, or destination so writes land after the canary.
Stack canary on amd64 usually has a null low byte. Leak reconstruction often looks like:
1 | canary = u64(b"\x00" + leak7) |
Signedness & Integer Bugs
Danger pattern: check in signed arithmetic, use in unsigned API.
1 | int size; |
-1 passes size <= 64, then becomes
0xffffffffffffffff as size_t.
Integer multiplication overflow:
1 | uint32_t bytes = count * record_size; |
If multiplication truncates before the check but the later copy/read uses the larger semantic size, bounds checks fail.
OOB Indexing
- Negative index can read/write before an array.
- Positive overlarge index can reach later locals, heap metadata, GOT, vtables, function pointers, or adjacent objects.
- Convert byte distance to index by dividing by element size.
1 | index = (target_addr - array_base) // element_size |
Partial Pointer Overwrite
Partial overwrite changes only low bytes of a pointer. Useful when high bytes are stable due to page alignment or same mapping.
1 | payload = b"A" * offset + p16(target_low_16) |
This works best when source and destination are in the same module/stack/heap region or when only the low page offset must change.
GOT Overwrite
Requirements:
- GOT target must be writable (
No RELROor sometimesPartial RELRO). - A write primitive reaches the target GOT entry.
- The overwritten function is called after the overwrite.
Common targets:
printf@GOT -> system, then pass"/bin/sh"or a command string toprintf.puts@GOT -> win + offset, but avoid recursion ifwinitself callsputsfirst.
Heap Bug Classes
- UAF: use a pointer after
free; can become type confusion if the freed chunk is reallocated as another object. - Double free: same chunk inserted into a free list twice; allocator-dependent exploitability.
- Overflow into metadata/object: corrupt size, next pointer, function pointer, vtable, length, or data pointer.
- Aliased backing store: a high-level view
(
ArrayBuffer, slice, sprite, ledger view) still references native heap memory after resize/free.
Pwndbg commands:
1 | heap |
ASLR & Bypass Techniques
Method 1: Memory Leak
Despite ASLR, the relative
offset between functions and data within the same
module remains constant.
- Leak a known pointer (e.g., a function address in the
GOT). - Subtract its constant
offsetto find the base address. - Calculate the addresses of all other gadgets/functions.
Method 2: Partial Overwrite (YOLO)
Memory is managed in pages (typically
0x1000 bytes). The lowest 12 bits of an address represent
the page offset and are not randomized by
ASLR.
- Strategy: Overwrite only the least significant 1 or
2 bytes of the
return address. This allows redirecting execution to a different instruction within the same page (or nearby) without knowing the randomized base. Overwriting 2 bytes usually requires a 1/16 brute-force of the 4th nibble.
Method 3: Fork Brute-force
In a fork()-based network server, child processes
inherit the exact memory layout of the parent,
including ASLR offsets and the canary.
- Strategy: Brute-force the
canaryorreturn addressbyte-by-byte. If the child crashes, the parent simply forks a new one with the same values, allowing for infinite attempts.
Canary brute-force skeleton:
1 | from pwn import * |
Method 4: ret2libc
With NX enabled, call existing libc code instead of injecting code.
- Leak a libc pointer, e.g.
puts(puts@GOT). - Compute
libc.address = leaked_puts - libc.sym["puts"]. - Call
system("/bin/sh")orexecvegadgets.
1 | elf = ELF("./chall") |
Method 5: ret2plt / ret2csu
ret2plt: call imported functions through PLT when the binary has useful PLT entries and controlled arguments.ret2csu: use gadgets inside__libc_csu_initto populateRDI,RSI,RDXwhen simplepop rdi; ret/pop rsi; retgadgets are missing.
Typical ret2csu idea:
1 | gadget 1: pop rbx; pop rbp; pop r12; pop r13; pop r14; pop r15; ret |
Set rbx = 0, rbp = 1,
r12 = function_pointer_table, r13 = arg1,
r14 = arg2, r15 = arg3.
Method 6: SROP
Sigreturn-oriented programming uses a fake rt_sigreturn
frame to set all registers. Useful when gadgets are scarce but you can
execute syscall with RAX = 15
(rt_sigreturn).
1 | frame = SigreturnFrame() |
Seccomp Notes
If shellcode or ROP mysteriously dies on execve, check
seccomp filters.
1 | seccomp-tools dump ./chall |
Common bypasses:
- If
execveis blocked but file syscalls are allowed:open/read/writeorsendfilethe flag. - If
openis blocked butopenatallowed: useopenat(AT_FDCWD, "/flag", 0). - If only
read,write,exit,sigreturnare allowed: consider SROP or staged ROP.
Shellcoding
Toolchain
1 | # Compile shellcode (static, no libc) |
Forbidden Bytes
Avoid bytes that terminate or split input strings:
| Byte | Name | Trigger |
|---|---|---|
0x00 |
null byte |
strcpy, printf (Terminator) |
0x0a |
newline |
fgets, scanf (End of Input) |
0x20 |
space |
scanf (Separator) |
Null Byte Avoidance Techniques
| Bad | Good | Reason |
|---|---|---|
mov RAX, 0 |
xor EAX, EAX |
Avoids long null byte sequence |
mov RAX, 5 |
xor EAX, EAX; mov AL, 5 |
Avoids 0x00 padding in 64-bit mov |
mov RAX, 10 |
push 9; pop RAX; inc RAX |
Avoids 0x0a (newline) |
Other size/filter tricks:
- Use 32-bit register writes (
eax,edi,esi) to avoidREX.W(0x48) and zero-extend into 64-bit registers. - Use
push imm; pop regfor small constants. - Use
cdqto zero/sign-fillRDXfromEAXwhenEAXis positive. - Store strings on the stack in little-endian order.
- XOR-encode constants to avoid bad bytes, then decode in-place.
- If shellcode is called through a register, inspect live registers
before execution; they may point to the shellcode mapping and save bytes
with
lea.
NOP Sled & Stack Shellcode
NOP sled (\x90) tolerates approximate jump targets. It
is useful when stack addresses shift due to environment differences.
1 | nop_sled = b"\x90" * 512 |
Avoid stack self-destruction: after ret,
RSP points just above the return address. Shellcode using
push may overwrite nearby bytes below RSP.
Place shellcode sufficiently before/after the overwritten return path or
use a large sled.
Staged Shellcode
When input length is too small, first-stage shellcode reads a larger second stage into RWX memory or stack, then jumps there.
1 | ; read(0, rsp, 0x400); jmp rsp |
Self-Modifying Code (SMC)
Runtime modification of the .text segment to bypass
static filters. Requires the segment to be writable
(-Wl,-N).
1 | ; Bypassing 'syscall' (0x0f05) filter |
If the first page is made RX after input, place the patching code and patch targets on a later still-writable page, or use a first-stage jump over the protected region.
CET: SHSTK & IBT
Intel CET adds hardware CFI:
- SHSTK (Shadow Stack): return addresses are mirrored
on a protected shadow stack.
retcompares the normal stack return address against the shadow one. A classic saved-RIPoverwrite can crash even when canary is disabled. - IBT (Indirect Branch Tracking): indirect
call/jmptargets must start withendbr64(f3 0f 1e fa). notrackprefix:notrack jmp raxcan bypass IBT checks for that branch, so a corrupted function pointer/register may jump to shellcode or arbitrary code even when IBT is enabled.
When SHSTK blocks ROP, look for:
- Non-return indirect branches:
jmp rax,call rax, switch-table dispatch. notrackjumps in compiler-generated switch code.- Writable function pointers, vtables, callback tables, jump-table indexes.
- Logic bugs that reach
winwithout hijackingret.
Reference Shellcodes
x64 root shell (No REX.W/0x48)
1 | # Avoids \x48 (REX.W prefix) |
x64 execve("/bin/sh") (22-23 bytes)
1 | ; Standard 22-23 byte execve("/bin/sh") |
x64 cat /flag
1 | .section .shellcode,"awx" |
x64 cat /flag (No REX.W/0x48)
1 | .global _start |
x64 cat /flag (No 'syscall' opcode)
1 | .section .shellcode,"awx" |
Common Terms
| Term | Description |
|---|---|
| ROP | Return-Oriented Programming. Chaining "gadgets" ending in
ret. |
| JOP/COP | Jump/Call-Oriented Programming. Chains indirect
jmp/call dispatch instead of
ret. |
| ret2win | Overwrite control flow to a hidden/success function in the binary. |
| ret2libc | Redirecting execution to a libc function instead of
shellcode. |
| ret2plt | Calling a PLT stub in the binary, often to leak or invoke imported functions. |
| ret2csu | Using __libc_csu_init gadgets to set up multi-register
function calls. |
| SROP | Sigreturn-Oriented Programming. Fake a signal frame to control registers. |
| PLT/GOT | Procedure Linkage Table & Global Offset Table. Used for resolving external library function addresses. |
| OOB | Out-of-bounds. Accessing memory outside the intended range of an array or buffer. |
| UAF | Use-after-free. Reusing a pointer after its backing allocation has been freed. |
| SMC | Self-modifying code. Code patches its own bytes at runtime. |
| CET | Intel Control-flow Enforcement Technology: mainly SHSTK
and IBT. |
| endbr64 | Valid landing instruction required by IBT for indirect branch targets. |
Exploit Skeletons
Local/Remote Toggle
1 | #!/usr/bin/env python3 |
Run with:
1 | python solve.py |
Leak Parsing
1 | io.recvuntil(b"leak: ") |
Flat Payloads
1 | payload = flat( |
Python: Integer to Byte Conversion
In Python, converting an integer (0-255) to a byte string requires
careful handling of the bytes() constructor:
- Incorrect:
bytes(guess)- If
guess = 5, this creates a null-filled byte string of length 5:b'\x00\x00\x00\x00\x00'.
- If
- Correct:
bytes([guess])- Passing a list (iterable) treats the integer as the actual byte value.
- If
guess = 65,bytes([65])results inb'A'(0x41).
movaps
movaps (Move Aligned Packed Single-Precision
Floating-Point Values) 是 x86/x64 架构下的一条
SIMD(单指令多数据流)指令,主要用于高效的数据传输。
- SIMD 向量化处理与高吞吐率: 现代 CPU
为了优化数据处理效率,会利用 XMM
寄存器执行向量化操作。
movaps能够单次吞吐 128 位(16 字节)的数据,极大地提升了内存带宽利用率。该指令被广泛应用于音视频编解码、图形渲染及密码学算法等密集型矩阵运算场景。 - 内存对齐 (Memory Alignment) 与缓存行 (Cache Line)
机制:
movaps是一条强制要求内存对齐的指令,要求操作数的内存首地址必须是 16 的倍数(即 16 字节对齐)。现代 CPU 的 L1 数据缓存行通常为 64 字节。强制 16 字节对齐保证了这 16 字节的数据块绝对不会跨越两个不同的缓存行(Cache Line Boundary)。这使得 CPU 内部的内存控制器只需发起单次寻址操作即可完成数据读取,避免了跨缓存行读取带来的性能惩罚。
x64 ROP 链中的栈对齐
在 64 位 Linux PWN(漏洞利用)中,通过 ROP 链调用
system() 或 printf()
等标准库函数时,常会遭遇程序崩溃。其根本原因在于破坏了系统调用约定的栈对齐规范。
- System V AMD64 ABI 栈对齐规范: 在执行
call指令跳转至目标函数前,rsp % 16 == 0。call会压入 8 字节返回地址,所以进入被调用函数第一条指令时通常是rsp % 16 == 8。标准函数序言再执行push rbp后恢复到rsp % 16 == 0。 - 触发
#GP异常的根本原因: 当攻击者通过缓冲区溢出劫持控制流,并利用 ROP 链直接跳转至system("/bin/sh")时,往往忽略了对当前rsp状态的维护。如果此时rsp存在 8 字节的偏移错位,当执行到 Glibc 内部(如do_system或vfprintf)时,由于这些高频函数在-O3优化级别下广泛使用movaps指令来操作局部变量,movaps遇到未对齐的栈地址会直接触发硬件级别的 General Protection Fault (#GP异常),最终由操作系统内核向进程发送SIGSEGV信号导致崩溃。 - Exploit 解决方案:
retGadget 栈平衡: 在 ROP 链中,在目标函数(如system)的地址之前,预先插入一个极简的retGadget(机器码0xC3)。ret指令的语义等价于pop rip,它会将栈顶的 8 字节数据弹出。这 8 字节的指针位移恰好起到了栈指针补偿的作用,将错位的rsp重新调整至 16 字节对齐状态,从而规避movaps导致的硬件异常。
Quick fix:
1 | ret = rop.find_gadget(["ret"])[0] |
32 位 (x86) 对齐
在 32 位漏洞利用中,因 movaps
导致的崩溃概率显著降低,其架构与编译层面的原因如下:
- 调用约定 (Calling Convention) 的差异: 32
位环境(如
cdecl)主要依赖栈传参,每次push参数会导致栈指针esp发生 4 字节的位移。这种高频的 4 字节扰动使得 16 字节对齐极其难以在整个调用链中维持。 - 编译器的指令选择策略: 鉴于 32
位下维护栈对齐的成本过高,编译器(尤其是旧版
GCC)通常会采取降级策略:默认生成容错率更高的
movups指令(Move Unaligned,允许非对齐读取),或者直接放弃生成 SSE 向量化指令,转而使用标量指令来处理普通 C 代码。 - 32 位的栈平滑方案: 如果在 32 位环境下确需解决 16
字节对齐导致的崩溃,不能像 64 位那样简单粗暴地填充一个
ret指令。攻击者需要精确计算当前esp距离目标对齐状态的偏移量(4 的倍数),并寻找相应数量的pop <reg>(例如pop eax; ret)Gadget 来进行微调,直到满足esp % 16 == 0的条件。