21 January 2026

From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

๐Ÿ’ก๋‘ ๊ฐ€์ง€ ๊ธฐ์ค€์˜ ์—”ํŠธ๋กœํ”ผ ๊ฐ’์— ๋”ฐ๋ผ logits ๊ธฐ๋ฐ˜๊ณผ sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น์„ ์„ ํƒ์ ์œผ๋กœ ์ ์šฉํ•˜๋Š” Symbiotic Watermarking ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆ

์ตœ๋ฏผ์˜
์ตœ๋ฏผ์˜

From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

Review

๋‹‰๋„ค์ž„ ํ•œ์ค„ํ‰๋ณ„์  (0/5)
๊ณ„๋ž€์ดˆ๋ฐฅwatermarking ์—ฐ๊ตฌ๋Š” ์ถ”์ƒ์ ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ (์–ด๋–ป๊ฒŒ ํ‰๊ฐ€ํ•˜๋Š”์ง€, ์–ด๋””์— ํ™œ์šฉํ• ์ง€), ์ด ๋…ผ๋ฌธ์€ ๊ทธ ๋ถ€๋ถ„์„ ์ž˜ ์ •๋ฆฌํ•ด์ค€ ๊ฒƒ ๊ฐ™์Œ. ๊ทผ๋ฐ TPR=1.0์ธ์ง€ 0.98์ธ์ง€๊ฐ€ ๊ทธ๋ ‡๊ฒŒ ์ค‘์š”ํ• ๊นŒ? ๊ทธ์ •๋„ ๋ ˆ๋ฒจ์—์„œ๋Š” ์›๋ฌธ์˜ ํ’ˆ์งˆ์„ ์†์ƒ์‹œํ‚ค์ง€ ์•Š๋Š” ๊ฒŒ ๋” ์ค‘์š”ํ•  ๊ฑฐ ๊ฐ™์€๋ฐ!3.2
๋งน๊ตฌwatermarking์ด ์ ์  ์ค‘์š”ํ•ด์ง„๋‹ค๊ณ  ์ƒ๊ฐํ•จ. ์ด ์—ฐ๊ตฌ๋งŒ ๋ณด๋ฉด wartermarking ์—ฐ๊ตฌ๊ฐ€ ๋๋‚˜๊ฐ€๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋А๊ปด์งˆ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ. ๋ฐฉ๋ฒ• ์•„์ด๋””์–ด ์ž์ฒด๋Š” ์—„์ฒญ ์ƒˆ๋กญ์ง€๋Š” ์•Š์€ ๊ฒƒ ๊ฐ™์•„๋„, ์—”ํŠธ๋กœํ”ผ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋“ฑ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์ข‹์€ ๋ฐฉ๋ฒ•์ธ ๊ฒƒ ๊ฐ™๋‹ค. 4.0
๊ตญ๋ฐฅ์ด ๋…ผ๋ฌธ์—์„œ ๊ฐ•์กฐํ•˜๋Š”๊ฑฐ๋Š” logit ๊ธฐ๋ฐ˜๊ณผ sampling ๊ธฐ๋ฐ˜์„ ๊ฒฐํ•ฉํ•ด์„œ trade-off๋ฅผ ์ค„์ด๊ฒ ๋‹ค์ธ๊ฑฐ๊ฐ™์€๋ฐ... ์ด์— ๋Œ€ํ•œ ์•„์ด๋””์–ด๋Š” ์•„์ฃผ ๊ดœ์ฐฎ์Œ!
๊ทผ๋ฐ ์‹คํ—˜ ๊ฒฐ๊ณผ์—์„œ ์ธ์ƒ์ ์ธ๊ฑฐ๋Š” robustness์ธ๋“ฏ. ์ด๊ฑธ ๊ฐ•์กฐํ•˜๋ฉด ์‚ฌ์‹ค watermark detection result ๊ฒฐ๊ณผ๊ฐ€ ์–ด๋–ป๋“  ์ƒ๊ด€์—†์„๊ฑฐ๊ฐ™์€๋ฐ ์„œ์ˆ  ๋ฐฉ์‹์ด ์•ฝ๊ฐ„ ์•„์‰ฝ๋‹ค?
4.3
ํ”ผ์žLogit ๊ธฐ๋ฐ˜ ๋ฐ Sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น์„ ์„ ํƒํ•ด์„œ ์›๋ณธ logit์ด๋‚˜ ์ถœ๋ ฅ ๊ฒฐ๊ณผ์˜ ํ›ผ์†์„ ์ตœ๋Œ€ํ•œ ์ค„์ด๊ฒ ๋‹ค๋Š” ์•„์ด๋””์–ด๋Š” ์ข‹์€ ๊ฒƒ ๊ฐ™์Œ.
์›Œํ„ฐ๋งˆํ‚น ๊ณต๊ฒฉ์„ ์–ธ๊ธ‰ํ–ˆ๋Š”๋ฐ, ์ด ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๋ฐฉ์–ด์„ฑ์ด ์–ผ๋งˆ๋‚˜ ๋˜๋Š”์ง€ ์ฆ๋ช…์ด ์žˆ์—ˆ์œผ๋ฉด ํ•จ.
4.0
ํ–„๋ฒ„๊ฑฐ์›Œํ„ฐ๋งˆํ‚น์€ ๊ฒฐ๊ตญ ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ์— ์‹ ํ˜ธ๊ฐ€ ๋‚จ๋Š” ๋ฐฉ์‹์ด๋ผ, ๊ณต๊ฒฉ์„ํ•˜๋ฉด ์‰ฝ๊ฒŒ ํ›ผ์†๋  ๊ฒƒ ๊ฐ™๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ, ๊ณต๊ฒฉ์—๋„ ํƒ์ง€๋ฅผ ์œ ์ง€ํ•œ๋‹ค๋Š” ์ ์ด ๋†€๋ž๋‹ค. ์—ฌ๋‹ด์ด๊ธดํ•œ๋ฐ sns์˜ ai ์˜์ƒ์ด ์š”์ฆ˜ ํŒ์„ ์น˜๊ณ  ์žˆ๋Š”๋ฐ ์›Œํ„ฐ๋งˆํ‚น์ด๋‚˜ ์ถœ์ฒ˜๋ฅผ ํ”Œ๋žซํผ ์ฐจ์›์—์„œ๋“ ์ง€ ๊ฐ•ํ•˜๊ฒŒ ์ ์šฉํ–ˆ์œผ๋ฉด ์ข‹๊ฒ ์Œ.3.8
์น˜ํ‚จwatermark ๊ด€๋ จ ๋…ผ๋ฌธ์€ ์ฒ˜์Œ์ด๋ผ introduction์„ ๋ชน์‹œ ์žฌ๋ฐŒ๊ฒŒ ์ฝ์—ˆ๋‹ค. ์ด๋Ÿฐ ์—ฐ๊ตฌ๋“ค์ด ๋งŽ์ด ๋‚˜์™”์œผ๋ฉด ์ข‹๊ฒ ๋‹ค. ์•„๋งˆ ์›Œํ„ฐ๋งˆํฌ ๊ด€๋ จ ๋ถ„์•ผ๋„ openai์ฒ˜๋Ÿผ ์„ ์ ํ•˜๋Š” ํšŒ์‚ฌ๊ฐ€ ๋–ผ๋ˆ์„ ๋ฒŒ์ง€ ์•Š์„๊นŒ? 4
ํŽ˜๋ธŒ๋ฆฌ์ฆˆ์–ด๋–ค ์—”ํŠธ๋กœํ”ผ๊ฐ€ ๋‚ฎ์„ ๋•Œ ์–ด๋–ค ์›Œํ„ฐ๋งˆํ‚น ๋ฐฉ๋ฒ•์ด ํ…์ŠคํŠธ ํ’ˆ์งˆ์— ์˜ํ–ฅ ๋œํ•˜๋‹ค, ์ด๊ฑด ์•Œ๊ฒ ๋Š”๋ฐ ์ด๋Ÿฐ ์‹์œผ๋กœ ํ•˜์ด๋ธŒ๋ฆฌ๋“œํ•˜๊ฒŒ ์›Œํ„ฐ๋งˆํ‚นํ•˜๋Š” ๊ฒŒ ์›Œํ„ฐ๋งˆํ‚น ์„ฑ๋Šฅ๋„ ๋” ์ข‹๋‹ค๋Š” ๊ฑด ์‹ ๊ธฐํ•˜๋‹ค. ํ† ํฐ๋งˆ๋‹ค ๋‹ค๋ฅด๊ฒŒ ํ•˜๋‹ˆ๊นŒ ์–ด๋–ค ์›Œํ„ฐ๋งˆํ‚น ์„ ํƒํ–ˆ๋Š”์ง€ ์•Œ๊ธฐ ์–ด๋ ต๊ณ  ๊ทธ๊ฒŒ ๊ณง ์›Œํ„ฐ๋งˆํ‚น ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๋Š” ๊ฑด๊ฐ€?3.6

TL; DR

๐Ÿ’ก

๋‘ ๊ฐ€์ง€ ๊ธฐ์ค€์˜ ์—”ํŠธ๋กœํ”ผ ๊ฐ’์— ๋”ฐ๋ผ logits ๊ธฐ๋ฐ˜๊ณผ sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น์„ ์„ ํƒ์ ์œผ๋กœ ์ ์šฉํ•˜๋Š” Symbiotic Watermarking ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆ

Summary

  • From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models, ACLโ€™25 | Link
  • Author
  • Citation: 2

Introduction

Background

Exceptional Capabilities of LLMs

  • LLM์€ ์ฐฝ์ž‘์ด๋‚˜ ๊ธ€์“ฐ๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์“ฐ์ด๊ณ  ์žˆ์œผ๋ฉฐ, ์ ‘๊ทผ์„ฑ์ด ๋†’์•„์ ธ ๋ˆ„๊ตฌ๋‚˜ ์‰ฝ๊ฒŒ AI ์ƒ์„ฑ ์ฝ˜ํ…์ธ ๋ฅผ ๋งŒ๋“ค๊ฑฐ๋‚˜ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•จ
  • ํ•˜์ง€๋งŒ LLM ํ™•์‚ฐ์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์œ„ํ—˜๋„ ์ฆ๊ฐ€ํ•จ:
    • ์•…์„ฑ ์ฝ˜ํ…์ธ  ์ƒ์„ฑ
    • ์ง€์‹ ์žฌ์‚ฐ๊ถŒ ์นจํ•ด
    • ํ—ˆ์œ„์ •๋ณด ๋ฐ ์ฒ ์ฒ˜๊ฐ€ ๋ถˆ๋ถ„๋ช…ํ•œ ์ฝ˜ํ…์ธ  ๋ฌธ์ œ
  • ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด watermarking์ด ๋“ฑ์žฅํ•จ
    • LLM ์ƒ์„ฑ ์ฝ˜ํ…์ธ ์˜ traceability, authenticity, accountability๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ 

LLM Watermarking

  • ๋ชจ๋ธ์ด ์ƒ์„ฑํ•œ ํ…์ŠคํŠธ ์•ˆ์— ์‚ฌ๋žŒ์ด ๋ˆˆ์น˜์ฑ„๊ธฐ ์–ด๋ ค์šด โ€œํ†ต๊ณ„์  ํŒจํ„ดโ€์„ ์‹ฌ์–ด, ๋‚˜์ค‘์— ๊ทธ ํ…์ŠคํŠธ๊ฐ€ ํŠน์ • LLM์—์„œ ์ƒ์„ฑ๋˜์—ˆ๋Š”์ง€๋ฅผ ํŒ๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ 
    • ํ™•๋ฅ ์ ์œผ๋กœ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด๋ฉฐ, ์ถœ์ฒ˜๋ฅผ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•จ
  • Watermarking ์•„์ด๋””์–ด
    • LLM์€ ๋‹ค์Œ token์„ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ์„ ํƒํ•˜๋Š”๋ฐ, ์›Œํ„ฐ๋งˆํ‚น์€ ์ด ๋ถ„ํฌ๋ฅผ ์•„์ฃผ ๋ฏธ์„ธํ•˜๊ฒŒ ์กฐ์ž‘ํ•˜์—ฌ ํŠน์ • ํ† ํฐ ์ง‘ํ•ฉ์ด ํ†ต๊ณ„์ ์œผ๋กœ ๋” ์ž์ฃผ ๋‚˜์˜ค๋„๋ก ํ•จ
      • ์˜๋ฏธ๋Š” ์œ ์ง€ํ•œ ์ฑ„ ํ†ต๊ณ„์  ํŽธํ–ฅ๋งŒ ๋ฏธ์„ธํ•˜๊ฒŒ ์‚ฝ์ž…

      โ†’ ์ด ํŒจํ„ด์ด ๋ˆ„์ ๋˜๋ฉด ๊ฒ€์ถœ์ด ๊ฐ€๋Šฅํ•ด์ง

    • Watermarking Detail
      1. ์›Œํ„ฐ๋งˆํ‚น์„ ํ•˜๋”๋ผ๋„ ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋Š” ๋‹ฌ๋ผ์ง€์ง€ ์•Š์Œ
        • LLM์€ ํ™•๋ฅ  ์ƒ˜ํ”Œ๋ง์ž„. ๊ฐ™์€ ํ”„๋กฌํ”„ํŠธ๋”๋ผ๋„ ์ถœ๋ ฅ์ด ๋งค๋ฒˆ ๋‹ค๋ฆ„
          A sentence about cats.
          โ†’ "Cats are lovely animals."
          โ†’ "Cats are very cute."
          โ†’ "Cats are friendly pets."
        • ์›Œํ„ฐ๋งˆํ‚น์€ ์›๋ž˜์˜ ์ถœ๋ ฅํ™•๋ฅ  (๋žœ๋ค์„ฑ)์—์„œ ๋ฐฉํ–ฅ๋งŒ ์‚ด์ง ๋ฐ€์–ด์ฃผ๋Š” ๊ฒƒ
          ํ† ํฐ์›๋ž˜์˜ ์ถœ๋ ฅ ํ™•๋ฅ ์›Œํ„ฐ๋งˆํฌ ํ›„
          cat0.300.31
          pet0.270.28
          dog0.280.26
          animal0.150.15

        โ†’ ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋Š” ๊นจ์ง€์ง€ ์•Š์Œ

      1. ์›Œํ„ฐ๋งˆํ‚น โ€˜ํƒ์ง€โ€™๋Š” ํ•œ ํ† ํฐ์ด ์•„๋‹Œ โ€˜๋ˆ„์  ํ†ต๊ณ„โ€™๋กœ ์ด๋ฃจ์–ด์ง
        • e.g., ๋™์ „๋˜์ง€๊ธฐ
          • ์ผ๋ฐ˜ ๋™์ „: ์•ž๋ฉด 50%, ๋’ท๋ฉด 50%
          • ์›Œํ„ฐ๋งˆํฌ ํ•œ ๋™์ „: ์•ž๋ฉด 52%, ๋’ท๋ฉด 48%

          โ†’ ํ•œ๋ฒˆ ๋˜์กŒ์„ ๋•Œ๋Š” ์ฐจ์ด๋ฅผ ๋ชจ๋ฅด์ง€๋งŒ, 1000๋ฒˆ ๋˜์ง€๋ฉด ๊ฐ์ง€ ๊ฐ€๋Šฅ

  • Watermarking ๊ธฐ์กด ์—ฐ๊ตฌ
    • Logits-based Watermarking
      • ํ† ํฐ ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ ์ผ๋ถ€ ํ† ํฐ ์ง‘ํ•ฉ์˜ logit์„ ๋ฏธ์„ธํ•˜๊ฒŒ ์กฐ์ •ํ•ด, ํ†ต๊ณ„์ ์œผ๋กœ ํŠน์ • ํŒจํ„ด์ด ๋‚˜ํƒ€๋‚˜๋„๋ก ์›Œํ„ฐ๋งˆํฌ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๋ฐฉ์‹(e.g., KGW, Unigram)
        • Next token ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ vocabulary๋ฅผ ๋‘ ์ง‘ํ•ฉ(red/green)์œผ๋กœ ๋‚˜๋ˆˆ ๋’ค, green ํ† ํฐ์˜ logit์„ ์‚ด์ง ์˜ฌ๋ ค ํ†ต๊ณ„์ ์œผ๋กœ ๋” ์ž์ฃผ ์„ ํƒ๋˜๋„๋ก ํ•จ
        • Logit-based ์˜ˆ
          • ํ™•๋ฅ ์„ ์‚ด์ง ๋ฐ”๊ฟ”์„œ ํŠน์ • ๋‹จ์–ด๋“ค์ด ๋” ์ž์ฃผ ๋ฝ‘ํžˆ๊ฒŒ ๋งŒ๋“ค์Œ
            ํ† ํฐ์›๋ž˜์˜ ์ถœ๋ ฅ ํ™•๋ฅ ์›Œํ„ฐ๋งˆํฌ ํ›„
            cat0.300.31
            pet0.270.28
            dog0.280.26
            animal0.150.15
      • KGW ๋ฐฉ์‹
        1. Vocabulary๋ฅผ red / green ๋ฆฌ์ŠคํŠธ๋กœ ๋ถ„ํ• 
        1. ์ด์ „ k๊ฐœ ํ† ํฐ + ์›Œํ„ฐ๋งˆํฌ ํ‚ค ฮพ โ†’ ํ•ด์‹œ
        1. green ํ† ํฐ logit์— ฮด bias๋ฅผ ๋”ํ•จ

          โ†’ green ํ† ํฐ์ด ๋” ์ž์ฃผ ์„ ํƒ๋จ

    • Sampling-based watermarking
      • ํ™•๋ฅ  ๋ถ„ํฌ๋Š” ์œ ์ง€ํ•œ ์ฑ„, ํ† ํฐ์„ ์„ ํƒํ•˜๋Š” ์ƒ˜ํ”Œ๋ง ๊ทœ์น™์„ ์กฐ์ž‘ํ•ด ์›Œํ„ฐ๋งˆํฌ ์‹ ํ˜ธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹(e.g., AAR, GumbelSoft)
        • ์ง€์ˆ˜ ์ตœ์†Œ ์ƒ˜ํ”Œ๋ง, tournament sampling, contrastive decoding ๋“ฑ
        • Sampling-based ์˜ˆ
          • ํ™•๋ฅ ์€ ์œ ์ง€ํ•˜๊ณ , ๋ฝ‘๋Š” ๊ทœ์น™๋งŒ ๋ฐ”๊ฟ”์„œ ํŒจํ„ด์„ ๋งŒ๋“ค์Œ
          • ์ƒ˜ํ”Œ๋ง ๊ทœ์น™์„ โ€˜๋‚œ์ˆ˜ r์ด 0.5 ์ด์ƒ์ด๋ฉด cat/pet ์ค‘์—์„œ๋งŒ ๋ฝ‘๊ธฐโ€™์ฒ˜๋Ÿผ ์กฐ์ •

            โ†’ ์„ ํƒ ๊ทœ์น™์ด ๋ฐ”๋€Œ์–ด ํŒจํ„ด์ด ์ƒ๊น€

            ํ† ํฐ์ถœ๋ ฅ ํ™•๋ฅ ์€ ๊ทธ๋Œ€๋กœ
            cat0.30
            pet0.27
            dog0.28
            animal0.15

Motivation

๊ธฐ์กด watermarking์˜ 3๊ฐ€์ง€ ํ•œ๊ณ„:

  1. ๋‹ค์–‘ํ•˜๊ณ  ์ ๋Œ€์ (adversarial) ํ™˜๊ฒฝ์—์„œ ๊ธฐ์กด ์›Œํ„ฐ๋งˆํ‚น์ด ์ž˜ ํ†ตํ•˜์ง€ ์•Š์Œ
    • ์‹ค์ œ ํ™˜๊ฒฝ์€ ๊ณต๊ฒฉ์ž(ํŒจ๋Ÿฌํ”„๋ ˆ์ด์ฆˆ, ์žฌ์ƒ์„ฑ ๋“ฑ)๊ฐ€ ์žˆ๊ณ  ์กฐ๊ฑด๋„ ๋‹ค์–‘ํ•จ

    โ‡’ ํ˜„์žฌ ์›Œํ„ฐ๋งˆํ‚น์€ ๊ทผ๋ณธ์  ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ

  1. Robustness โ†” Text Quality Trade off <Fig 1>
    • ์›Œํ„ฐ๋งˆํฌ๋ฅผ ๊ฐ•ํ•˜๊ฒŒ ์‹ฌ์œผ๋ฉด

      โ†’ ํ…์ŠคํŠธ ์œ ์ฐฝ์„ฑ/ ์ž์—ฐ์Šค๋Ÿฌ์›€/ ๋‹ค์–‘์„ฑ์ด ๋–จ์–ด์ง

    • ํ…์ŠคํŠธ ํ’ˆ์งˆ์„ ์šฐ์„ ํ•˜๋ฉด

      โ†’ ๊ณต๊ฒฉ์— ์ทจ์•ฝํ•ด์ ธ ์›Œํ„ฐ๋งˆํฌ๊ฐ€ ์‰ฝ๊ฒŒ ๊นจ์ง

    โ‡’ ์›Œํ„ฐ๋งˆํฌ๋ฅผ ๊ฐ•ํ•˜๊ฒŒ ํ•˜๋ฉด ๊ธ€์ด ์–ด์ƒ‰ํ•ด์ง€๊ณ , ๊ธ€์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•˜๋ฉด ์›Œํ„ฐ๋งˆํฌ๊ฐ€ ์•ฝํ•ด์ง

  1. Security Issue
    • ํŠนํžˆ KGW ๊ณ„์—ด ๊ฐ™์€ ์ผ๋ถ€ ๋ฐฉ์‹์€ ๋นˆ๋„ ๋ถ„์„์œผ๋กœ ๊ทœ์น™์„ ์—ญ์ถ”์ •ํ•˜๊ฑฐ๋‚˜ ์›Œํ„ฐ๋งˆํฌ๋ฅผ โ€˜ํ›”์ณ์„œโ€™ ์œ„์กฐ(spoof)ํ•˜๋Š” watermark stealing ๊ณต๊ฒฉ์— ์ทจ์•ฝํ•จ
      • KGW: logits-based ์›Œํ„ฐ๋งˆํ‚น ๊ธฐ๋ฒ•

    โ‡’ ๊ณต๊ฒฉ์ž๊ฐ€ ์›Œํ„ฐ๋งˆํ‚น ๊ทœ์น™์„ ํ•™์Šต/์ถ”์ •ํ•ด์„œ โ€˜์›Œํ„ฐ๋งˆํฌ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ํ…์ŠคํŠธโ€™๋ฅผ ๋งŒ๋“ค ์ˆ˜๋„ ์žˆ์Œ

  • Logits ๊ธฐ๋ฐ˜/ Sampling ๊ธฐ๋ฐ˜ ๋ชจ๋‘ ๊ณ ์œ ํ•œ trade-off๋ฅผ ๊ฐ–๊ณ  ์žˆ์–ด์„œ โ€˜์ด๊ฒŒ ์ •๋‹ต์ด๋‹คโ€™๋ผ๊ณ  ํ•  ๋‹จ์ผ ์›์น™์ด ์•„์ง ์—†์Œ

    โ‡’ ๋‘ ๊ณ„์—ด์„ ๊ฒฐํ•ฉํ•˜์ž!

Contribution

  • Symbiotic Watermarking Framework ์ œ์•ˆ
    • logits ๊ธฐ๋ฐ˜๊ณผ sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น์„ ๊ฒฐํ•ฉํ•˜๋Š” Series, Parallel, Hybrid ์„ธ ๊ฐ€์ง€ ์ „๋žต์„ ํ†ตํ•ฉ์ ์œผ๋กœ ์ œ์•ˆ
    • Entropy ๊ธฐ๋ฐ˜ Adaptive Hybrid ์ „๋žต: Token Entropy์™€ Semantic Entropy๋ฅผ ์ด์šฉํ•ด ํ† ํฐ๋ณ„๋กœ ์ตœ์ ์˜ ์›Œํ„ฐ๋งˆํ‚น ๋ฐฉ์‹์„ ์ž๋™ ์„ ํƒํ•˜๋Š” Hybrid symbiotic watermarking์„ ์ œ์•ˆ
  • ๊ธฐ์กด ๋ฐฉ๋ฒ• ๋Œ€๋น„ SOTA ์„ฑ๋Šฅ ๋‹ฌ์„ฑ
    • ์›Œํ„ฐ๋งˆํ‚น ํƒ์ง€ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ, ํ…์ŠคํŠธ ํ’ˆ์งˆ ์ €ํ•˜๋ฅผ ์ตœ์†Œํ™”, ์—ฌ๋Ÿฌ ํ˜„์‹ค์„ธ๊ณ„ ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๊ฐ•์ธ์„ฑ์„ ์ž…์ฆํ•จ

Methods

  • ๊ฒฐํ•ฉ ๋ฐฉ์‹์— ๋”ฐ๋ผ ์„ธ๊ฐ€์ง€ ๋ฐฉ์‹์ด ์žˆ์Œ
    • Series
    • Parallel
    • Hybrid* (โ† ๋…ผ๋ฌธ์ด ์ฃผ์žฅํ•˜๋Š” ๋ฉ”์ธ ๋ฐฉ๋ฒ•)

1. Series Symbiotic Watermark

Logits ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น๊ณผ Sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น์„ ์ˆœ์„œ๋Œ€๋กœ ๋ชจ๋‘ ์ ์šฉํ•˜๋Š” ๋ฐฉ์‹
  • ํ•œ ํ† ํฐ ์ƒ์„ฑ ๊ณผ์ •์—์„œ ์›Œํ„ฐ๋งˆํฌ๋ฅผ ๋‘๋ฒˆ ์‹ฌ๋Š” ๊ตฌ์กฐ

  • ์ ์šฉ ๊ณผ์ •
    1. ๋ชจ๋ธ์ด ์›๋ž˜ logits ltl_t๏ปฟ ์ƒ์„ฑํ›„,
    1. Logits-based watermark AwA_w๏ปฟ ์ ์šฉ โ†’ logits ๋ถ„ํฌ ์กฐ์ž‘
    1. softmax๋กœ ํ™•๋ฅ  ๋ถ„ํฌ ๋ณ€ํ™˜
    1. Sampling-based watermark SwS_w๏ปฟ ์ ์šฉ โ†’ ์ƒ˜ํ”Œ๋ง ๊ทœ์น™ ์กฐ์ž‘
    1. ์ตœ์ข… ํ† ํฐ yty_t๏ปฟ ์ƒ์„ฑ

  • ์žฅ์ : ์›Œํ„ฐ๋งˆํฌ ์‹ ํ˜ธ๋Š” ๊ฐ€์žฅ ๊ฐ•ํ•จ
  • ํ•œ๊ณ„
    • ํ™•๋ฅ  ๋ถ„ํฌ๋„ ๋ฐ”๊พธ๊ณ  ์ƒ˜ํ”Œ๋ง ๊ทœ์น™๋„ ๋ฐ”๊พธ๋ฏ€๋กœ ํ…์ŠคํŠธ ํ’ˆ์งˆ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Œ
    • ๊ฐ™์€ ํ”„๋กฌํ”„ํŠธ์—์„œ ์ถœ๋ ฅ ๋‹ค์–‘์„ฑ์ด ์ค„์–ด๋“ฆ

2. Parallel Symbiotic Watermark

Logits ๊ธฐ๋ฐ˜๊ณผ Sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํฌ๋ฅผ ์„œ๋กœ ๊ฐ„์„ญ ์—†์ด ๋…๋ฆฝ์ ์œผ๋กœ ์‚ฝ์ž…
  • ๋‘ ์›Œํ„ฐ๋งˆํฌ๊ฐ€ ์ง์ ‘ ์ถฉ๋Œํ•˜์ง€ ์•Š์•„ ํ’ˆ์งˆ ์ €ํ•˜๊ฐ€ ์™„ํ™”๋จ

  • ์ ์šฉ ๊ณผ์ •
    • LM์ด ํ† ํฐ์„ ์ƒ์„ฑํ•  ๋•Œ, ํ† ํฐ ์œ„์น˜์— ๋”ฐ๋ผ ์ ์šฉ ๋ฐฉ๋ฒ•์„ ๋ฒˆ๊ฐˆ์•„ ์‚ฌ์šฉ
      • ์ง์ˆ˜ ์œ„์น˜: logits ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น AwA_w๏ปฟ ์ ์šฉ
      • ํ™€์ˆ˜ ์œ„์น˜: sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น SwS_w๏ปฟ ์ ์šฉ

  • ์žฅ์ : ๋‘ ์›Œํ„ฐ๋งˆํฌ๊ฐ€ ์ง์ ‘ ์ถฉ๋Œํ•˜์ง€ ์•Š์•„ ํ’ˆ์งˆ ์ €ํ•˜ ์™„ํ™”
  • ํ•œ๊ณ„: ํ† ํฐ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์ตœ์  ๋ฐฉ์‹์„ ๊ณ ๋ฅผ ์ˆ˜ ์—†์Œ

3. Hybrid Symbiotic Watermark

Logit ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํฌ ํ˜น์€ Sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํฌ ์‚ฌ์šฉํ• ์ง€๋ฅผ Entropy ๊ธฐ๋ฐ˜์œผ๋กœ ์ž๋™ ๊ฒฐ์ •ํ•˜๋Š” ์ ์‘ํ˜• ๋ฐฉ์‹
  • Hybrid ์—์„œ๋Š” ๋‘ ๊ฐ€์ง€์˜ ๊ธฐ์ค€์„ ์‚ฌ์šฉํ•จ
    1. Token Entropy (TE)

      โ€œ์ด ์‹œ์ ์—์„œ ๋‹ค์Œ ํ† ํฐ์„ ์–ผ๋งˆ๋‚˜ ํ™•์‹ ํ•˜๊ณ  ์žˆ๋Š”๊ฐ€?โ€

      • ๋ชจ๋ธ์ด ๋‹ค์Œ์— ๋‚˜์˜ฌ ๋‹จ์–ด๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž์‹  ์žˆ๊ฒŒ ํ•˜๋‚˜๋กœ ์ •ํ•˜๊ณ  ์žˆ๋Š”์ง€, ์•„๋‹ˆ๋ฉด ์—ฌ๋Ÿฌ ํ›„๋ณด ์ค‘์—์„œ ์• ๋งคํ•ดํ•˜๋Š”์ง€๋ฅผ ์ˆ˜์น˜๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ
      • ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•
        • ์‹œ์ tt๏ปฟ ์—์„œ ํ† ํฐ ii๏ปฟ๊ฐ€ ๋‚˜์˜ฌ ํ™•๋ฅ  ptip_t^i๏ปฟ ์˜ ํ•ฉ์„ ๊ตฌํ•จ
      • Token Entropy ํ•ด์„
        • Low token Entropy
          The capital of France is ___
          => next token ํ›„๋ณด: Paris:0.85 Lyon 0.05 city:0.03 ...
          • ๋ชจ๋ธ์ด Paris ๋ฅผ ๊ฑฐ์˜ ํ™•์‹ ์ค‘ โ†’ Token Entropy โ†“
          • ์ด๋•Œ logit ์„ ๊ฑด๋“œ๋ฆฌ๋ฉด ๋ฌธ์žฅ์˜ ์ž์—ฐ์Šค๋Ÿฌ์›€์ด ๊นจ์งˆ ์ˆ˜ ์žˆ์Œ

            โ†’ Logit ์›Œํ„ฐ๋งˆํ‚น์„ ํ”ผํ•ด์•ผ ํ•จ

        • High Token Entropy
          She felt very ___
          => next token ํ›„๋ณด: happy:0.25 sad:0.23 tired:0.22 ...
          • ์—ฌ๋Ÿฌ ํ›„๋ณด๊ฐ€ ๋น„์Šท โ†’ Token Entropy โ†‘
          • ๋ชจ๋ธ์ด ์• ๋งคํ•œ ์ƒํƒœ๋ผ, ์–ด๋–ค ํ† ํฐ์„ ๋ฝ‘์•„๋„ ์ž์—ฐ์Šค๋Ÿฌ์›€ ์œ ์ง€ ๊ฐ€๋Šฅ

            โ†’ Logit ์›Œํ„ฐ๋งˆํ‚น์„ ์ ์šฉํ•ด๋„ ํ’ˆ์งˆ ์†์ƒ ์ ์Œ

        โ‡’ Token Entropy๊ฐ€ ๋†’์„ ๋•Œ๋งŒ logits-based watermarking์„ ์ ์šฉ

    1. Semantic Entropy (SE)

      โ€œํ˜„์žฌ ์‹œ์ ์—์„œ top-k ํ›„๋ณด ํ† ํฐ๋“ค์ด ์˜๋ฏธ์ ์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋‹ค์–‘ํ•œ๊ฐ€?โ€

      • ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•
        1. top-k ํ† ํฐ๋“ค์˜ ์ž„๋ฒ ๋”ฉ์„ ์ถ”์ถœ K-means๋กœ ์˜๋ฏธ ํด๋Ÿฌ์Šคํ„ฐ๋ง
        1. ๊ฐ™์€ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•œ ํ† ํฐ์˜ ํ™•๋ฅ ์„ ํ•ฉ์นจ
        1. ์ด ํด๋Ÿฌ์Šคํ„ฐ ํ™•๋ฅ ๋“ค๋กœ ์—”ํŠธ๋กœํ”ผ ๊ณ„์‚ฐ
      • Sementic Entropy ํ•ด์„
        • Low Semantic Entropy
          • top ํ›„๋ณด๋“ค์ด ์˜๋ฏธ์ ์œผ๋กœ ๋น„์Šทํ•œ ๊ฒฝ์šฐ
            • e.g., {happy, glad, pleased, joyful}
          • ์–ด๋–ค ๊ฑธ ๊ณจ๋ผ๋„ ์˜๋ฏธ๊ฐ€ ๊ฑฐ์˜ ๋™์ผํ•˜๊ธฐ ๋•Œ๋ฌธ์—, sampling์„ ๋ฐ”๊ฟ”๋„ ๋ฌธ์žฅ ์˜๋ฏธ ์•ˆ ๊นจ์ง

            โ†’ Sampling watermark ์ ์šฉํ•ด๋„ ์•ˆ์ „

        • High Semantic Entropy
          • top ํ›„๋ณด๋“ค์ด ์˜๋ฏธ์ ์œผ๋กœ ๋งค์šฐ ๋‹ค๋ฆ„
            • e.g., {happy, angry, dead, finished}
          • sampling ๋ฐ”๊พธ๋ฉด ์˜๋ฏธ๊ฐ€ ํฌ๊ฒŒ ๋ฐ”๋€œ

            โ†’ Sampling watermark ์œ„ํ—˜

        โ‡’ Sementic Entropy ๊ฐ€ ๋‚ฎ์„ ๋•Œ๋งŒ Sampling ์›Œํ„ฐ๋งˆํฌ๋ฅผ ์ ์šฉ

Experiment

Setting

  • Dataset: news-like C4 dataset, long-form OpenGen dataset
  • Inserting Watermark
    1. ์›๋ณธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ,
      • ๋งˆ์ง€๋ง‰ 200 ํ† ํฐ โ†’ ์ž์—ฐ ํ…์ŠคํŠธ๋กœ ์œ ์ง€
      • ๊ทธ ์•ž ๋ถ€๋ถ„ โ†’ ํ”„๋กฌํ”„ํŠธ(prompt) ๋กœ ์‚ฌ์šฉ
      [ํ”„๋กฌํ”„ํŠธ ๋ถ€๋ถ„] + [์ž์—ฐ ํ…์ŠคํŠธ 200ํ† ํฐ]
    1. ํ”„๋กฌํ”„ํŠธ๋งŒ LLM์— ๋„ฃ์–ด์„œ 200 ยฑ\pm๏ปฟ 30 ํ† ํฐ์„ ์ƒˆ๋กœ ์ƒ์„ฑ
      • ์ด ๊ตฌ๊ฐ„์—์„œ SymMark ์›Œํ„ฐ๋งˆํ‚น ์ „๋žต(Series / Parallel / Hybrid)์„ ์ ์šฉ
  • Models
    • OPT ๊ณ„์—ด: OPT-6.7B, OPT-2.7B, OPT-1.3B
    • LLaMA ๊ณ„์—ด: LLaMA3-8B-Instruct, LLaMA2-7B-chat-hf
    • GPT ๊ณ„์—ด: GPT-J-6B
  • Baselines
    • Logits ๊ธฐ๋ฐ˜: KGW, Unigram, SWEET, EWD, DIP, Unbiased
    • Sampling ๊ธฐ๋ฐ˜: AAR, EXP, ITS, GumbelSoft, SynthID
  • Evaluation Metrics
    • Detectability: TPR, TNR, Best F1 Score, AUC
    • Robustness: AUROC curve ๊ธฐ๋ฐ˜์œผ๋กœ threshold ๋ณ€ํ™”์— ๋”ฐ๋ฅธ FPRโ€“TPR ๊ด€๊ณ„ ํ‰๊ฐ€

Results

Watermark Detection Results

  • ๊ธฐ์กด ์›Œํ„ฐ๋งˆํ‚น ๋Œ€๋น„ ํƒ์ง€ ์„ฑ๋Šฅ์„ ์–ผ๋งˆ๋‚˜ ํ–ฅ์ƒ์‹œ์ผฐ๋Š”๊ฐ€?
  • Series: ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹ยท๋ชจ๋ธ์—์„œ TPR = 1.000 (SOTA) ๋‹ฌ์„ฑ
    • ํ•˜์ง€๋งŒ logits/ sampling์„ ๋ชจ๋‘ ์ œ์•ฝํ•˜๋ฏ€๋กœ ํ…์ŠคํŠธ ํ’ˆ์งˆ ์ €ํ•˜๊ฐ€ ๋‹จ์ 
  • Parallel: ํ† ํฐ๋‹น ํ•˜๋‚˜์˜ ์›Œํ„ฐ๋งˆํฌ๋งŒ ์ ์šฉํ•ด๋„ ์ถฉ๋ถ„ํ•œ ํƒ์ง€ ์‹ ํ˜ธ ํ™•๋ณด

    โ†’ ์ด์ค‘ ์›Œํ„ฐ๋งˆํ‚น์ด ํ•ญ์ƒ ํ•„์š”ํ•œ ๊ฒƒ์€ ์•„๋‹˜

  • Hybrid: ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹ ๋ฐ ๋ชจ๋ธ์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ์ผ๊ด€๋˜๊ฒŒ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ

Text Quality

  • SymMark ์›Œํ„ฐ๋งˆํ‚น์ด ํ…์ŠคํŠธ ํ’ˆ์งˆ์„ ์–ผ๋งˆ๋‚˜ ํ›ผ์†ํ•˜๋Š”์ง€ ํ‰๊ฐ€ํ•˜๊ณ ์ž ํ•จ
  • Setting
    • Task: 4 tasks์— ๋Œ€ํ•ด ์‹คํ—˜
    • GM (Generation Metric): ํƒœ์Šคํฌ ์„ฑ๋Šฅ ์ ์ˆ˜
    • DROP: ์›Œํ„ฐ๋งˆํ‚น์œผ๋กœ ์ธํ•œ ์„ฑ๋Šฅ ๊ฐ์†Œ์œจ
    • Llama3-8B: ์›Œํ„ฐ๋งˆํ‚น ์—†๋Š” ์ˆœ์ˆ˜ ์„ฑ๋Šฅ ๊ธฐ์ค€
  • Series: TPR/TNR ๋งค์šฐ ๋†’์Œ (ํƒ์ง€๋Š” ๊ฐ•ํ•จ) ํ•˜์ง€๋งŒ DROP์ด ์—ฌ์ „ํžˆ ํผ

    โ†’ ํƒ์ง€๋Š” ์ข‹์ง€๋งŒ ํ’ˆ์งˆ ์†์ƒ ํผ

  • Hybrid: ๋ชจ๋“  ํƒœ์Šคํฌ์—์„œ ๊ฐ€์žฅ ์ž‘์€ ์„ฑ๋Šฅ ์†์‹ค, ๋™์‹œ์— TPR/TNR๋„ ๋งค์šฐ ๋†’์Œ

    โ†’ ํƒ์ง€ ์„ฑ๋Šฅ๊ณผ ํƒœ์Šคํฌ ์„ฑ๋Šฅ์„ ๋™์‹œ์— ๊ฐ€์žฅ ์ž˜ ์œ ์ง€

Robustness to Real-world Attacks

  • ์›Œํ„ฐ๋งˆํ‚น์ด ์‹ค์ œ ํ™˜๊ฒฝ ๊ณต๊ฒฉ(ํŽธ์ง‘, ๋ณต์‚ฌ, ๋ฒˆ์—ญ, ํŒจ๋Ÿฌํ”„๋ ˆ์ด์ง•) ํ›„์—๋„ ํƒ์ง€๋˜๋Š”์ง€๋ฅผ ํ‰๊ฐ€
  • Setting
    • Attack: Editing, Copy-Paste, Back-Translation, Rephrasing
    • AUROC curve๋กœ ํ™•์ธํ•˜๊ณ ์ž ํ•จ; ๊ณก์„ ์ด ์™ผ์ชฝ ์œ„์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ข‹๊ณ , AUC๊ฐ€ ํด์ˆ˜๋ก ๊ฐ•์ธํ•œ ์›Œํ„ฐ๋งˆํ‚น
  • ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ๊ณต๊ฒฉ์— ๋”ฐ๋ผ ๊ธ‰๊ฒฉํžˆ ๋ฌด๋„ˆ์ง€๋Š” ์–‘์ƒ์„ ๋ณด์ž„
  • Hybrid (Ours-H): ๊ฑฐ์˜ ๋ชจ๋“  ๊ณต๊ฒฉ์—์„œ ์ตœ์ƒ์œ„ ์œ„์น˜
    • ๋Œ€๋ถ€๋ถ„ ๊ทธ๋ž˜ํ”„์—์„œ AUC 0.98~, ๊ณต๊ฒฉ ์ข…๋ฅ˜๊ฐ€ ๋‹ฌ๋ผ๋„ ์„ฑ๋Šฅ์ด ๊ฑฐ์˜ ์œ ์ง€๋จ

Categories

research