UMDCTF 2026 - flow

Posted on 2026-04-26 Edited on 2026-05-24 In ctf Views: Disqus:

Challenge Details
Category	ML / Adversarial ML
Points	~300
Solves	~15

Purify the stream... or corrupt it.

flow was a fascinating adversarial machine learning challenge. We were given a pre-trained CNN classifier, a RealNVP-style normalizing flow model, and a reference time-series window x_ref of shape (5, 64) — think of it as 5 sensor channels with 64 timesteps each.

前置知识

本题涉及归一化流（Normalizing Flows）作为净化防御（Purification Defense）的绕过，属于对抗性攻击的高级场景。以下概念有助于理解：

概念	说明
归一化流净化器	输入→流编码器→缩放潜变量→流解码器→分类器，用于移除对抗扰动
C&W 攻击	将对抗样本构造转化为带约束优化，比 FGSM 更强
可微分管道	流 + 分类器端到端可微，攻击者可反向传播穿透净化
潜空间优化	在流的潜空间中优化扰动，分布更规则
铰链损失（Hinge Loss）	max(0, margin - value)，约束满足时损失为 0
Softmax / Logits	logits 是原始输出，Softmax 转概率。C&W 损失直接使用 logits

参考：C&W 攻击 arXiv:1608.04644，流对抗鲁棒性 arXiv:1911.08654。

Understanding the Purifier

Let's look at how purification worked:

def purify(x, flow, n_steps=3, alpha=0.55):
    for _ in range(n_steps):
        z, _ = flow.forward(x)
        x = flow.inverse(alpha * z)
    return x

After 3 steps, x_purified = flow.inverse(0.55^3 * z_submit). The classifier sees this shrunk latent version, while the flow likelihood check is performed on the original submitted sample.

This mismatch is the key vulnerability. The defense checks likelihood before purification but classifies after, creating a differentiable pipeline that's ripe for gradient-based exploitation.

Attack Strategy: Optimize in Latent Space

The core insight is that we can backpropagate through the entire pipeline — flow encoder, latent manipulation, flow decoder, and classifier — to find an input that simultaneously satisfies all constraints.

Here's the approach:

Encode the reference window into latent space: z_ref = flow.encoder(x_ref)
Optimize z with gradient descent to maximize class 1 logits under the purified version, while penalizing:
- L-infinity distance from x_ref in the original (submitted) space
- Log-probability below the threshold
Decode the final z back to data space to get sub

The loss function looked like:

def attack_loss(z, x_ref, flow, classifier, threshold, linf_max=0.05):
    # Decode z back to data space (submitted sample)
    sub = flow.decoder(z)

    # Purify and classify
    z_noise, _ = flow.encoder(sub)
    x_purified = flow.decoder((0.55**3) * z_noise)
    logits = classifier(x_purified)
    probs = softmax(logits)

    # Loss: minimize class 1 negative log-probability
    cls_loss = -probs[1]

    # L-infinity penalty (hinge-style)
    linf_dist = torch.max(torch.abs(sub - x_ref))
    linf_penalty = max(0, linf_dist - linf_max) * 1000

    # Log-prob penalty
    logp = flow.log_prob(sub)
    logp_penalty = max(0, threshold - logp) * 100

    return cls_loss + linf_penalty + logp_penalty

Run this with Adam for ~3000 steps, and we converge to a solution that satisfies all constraints.

Results

Our final submission achieved:

Metric	Value	Target
L-infinity	0.049	<= 0.05
Log-probability	1062	>= threshold
P(class=1)	0.825	>= 0.80

A lovely nod to the seminal work on adversarial examples (Explaining and Harnessing Adversarial Examples, Goodfellow et al.) — though the Carlini & Wagner attack (C&W) specifically inspires the optimization-based approach used here.

Key Takeaways

Never trust a mismatch between defense checks and classification inputs. If you check the submitted sample but classify a purified version, an adversary can exploit the gradient path through both.
Normalizing flows are differentiable end-to-end, making them a double-edged sword: they can purify, but they can also be used to craft adversarial inputs when the full pipeline is exposed.
Optimization-based attacks (à la C&W) are more powerful than fast gradient methods when you have access to the full model. 3000 steps of Adam beats one step of FGSM every time.

Flag

UMDCTF{id_like_to_thank_athalye_carlini_and_wagner_for_their_research}