WeChall - Training - Crypto - Digraphs

Posted on 2026-06-05 In ctf Views: Disqus:

Challenge

Digraph substitution cipher — 每个明文字母编码为 2 个字符的 pair。密文保留原始空格分词，无换行。明文是标准英文，含大小写和标点。

密文约 290 字符，136 个 digraph pair，30 个唯一 digraph（26 字母 + 4 标点 !?.:）。映射关系和密码随 session 变化。

这是同音 cipher — 同一字母可能对应多个 digraph（例如 cv 和 xl 都映射到 c，qn 和 hc 都映射到 e）。这意味着纯频率分析无法唯一确定映射，需要结合词结构消歧。

Solution

第一步：识别标点 digraph

密文保留空格分词，标点出现在词尾。22 个词的固定结构：

词 0  "congratulations!"  → 末尾 digraph = '!'
词 5  "successfully!"     → 末尾 digraph = '!'（应与词 0 相同）
词 10 "either."           → 末尾 digraph = '.'
词 12 "it?"               → 末尾 digraph = '?'
词 13 "well."             → 末尾 digraph = '.'（应与词 10 相同）
词 20 "solution:"         → 末尾 digraph = ':'

第二步：用词结构逐步还原字母映射

词 0 "congratulations!" 是关键锚点 — 16 个 digraph 对应 16 个字符，且字母重复模式固定（a 出现 2 次、t 2 次、o 2 次、n 2 次）。对齐后直接得到 12 个字母映射。

然后用短词交叉验证：

词 19 "as"           → 2 digraph，与词 20 首字母共享
词 20 "solution:"    → 9 digraph，末尾是 ':'
词 1  "?ou"          → "you" → 第 1 个 digraph = 'y'
词 3  "t?is"         → "this" → 第 2 个 digraph = 'h'
词 6  "?as"          → "was" → 第 1 个 digraph = 'w'
词 7  "not"          → 直接验证 n/o/t 映射
词 8  "too"          → 验证 'o' 的双字母模式

逐步扩展到全部 30 个 digraph。

第三步：模拟退火（备选/验证）

纯频率分析 + 模拟退火也能解，但单独使用收敛较慢（同音 cipher 导致多个等价映射）。词结构推导出部分映射后，用模拟退火补全剩余未知 digraph 更实用。

import random, math

eng_freq = {'e':.127,'t':.091,'a':.082,'o':.075,'i':.070,'n':.067,
            's':.063,'h':.061,'r':.060,'d':.043,'l':.040}

def score_word(word):
    clean = word.rstrip('!.,?:').lower()
    s = sum(eng_freq.get(c, 0) * 2 for c in clean)
    common = {'the':5,'and':5,'was':5,'not':5,'too':5,'you':5,'this':5,
              'it':5,'as':5,'is':5,'congratulations':10,'decrypted':10,
              'successfully':10,'difficult':10,'keyword':10,'solution':10}
    return s + common.get(clean, 0)

def solve_sa(digraphs, chars, words, n_iter=100000, n_restarts=20):
    best_mapping, best_score = None, -1
    for _ in range(n_restarts):
        m = dict(zip(digraphs, random.sample(chars, len(digraphs))))
        sc = sum(score_word(decode(m, w)) for w in words)
        t = 5.0
        for _ in range(n_iter):
            d1, d2 = random.sample(digraphs, 2)
            m[d1], m[d2] = m[d2], m[d1]
            ns = sum(score_word(decode(m, w)) for w in words)
            if ns > sc or random.random() < math.exp((ns - sc) / max(t, 0.01)):
                sc = ns
                if sc > best_score:
                    best_score, best_score = sc, dict(m)
            else:
                m[d1], m[d2] = m[d2], m[d1]
            t *= 0.99995
    return best_mapping

实测：20 次重启 × 100k 迭代，约 2 分钟，不一定收敛到正确明文。词结构推导秒出结果。

第四步：提取密码

明文末尾格式：enter this keyword as solution: [PASSWORD]!

密码在 solution: 后、! 前。提交时去掉末尾 !（它是明文标点，不是密码的一部分）。

注意事项： - 同音 cipher：同一字母可能对应多个 digraph，不能假设一一对应 - 密码随 session 变化，每次访问页面重新生成