26 March 2026

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

๐Ÿ’ก๋„ˆ๋ฌด ๊ธธ๊ณ  ๋ณต์žกํ•œ reasoning ๊ฒฝํ–ฅ์„ ์™„ํ™”ํ•˜์ž!โ‡’ reasoning process๋ฅผ ์„ธ๋‹จ๊ณ„๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ , ๊ทธ ์ค‘์— ์–ด๋–ค ๊ฑธ ์ค„์—ฌ์•ผ ํ• ์ง€ ๋ถ„์„ํ•˜์ž

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Review

๋‹‰๋„ค์ž„ Strength & Weakness & Sugguestions ๋ณ„์  (0/5)
๋Œ“์ธ ๋…ธ๋…ธ โ€ข ์žฅ์ : reasoning process๋ฅผ ์„ธ๋ถ€์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ณ , ๋ช…ํ™•&๊ฐ„๊ฒฐํ•œ ์ถ”๋ก ์„ ์œ„ํ•œ ๋ฐฉ๋ฒ• ์ œ์‹œ
โ€ข ๋‹จ์ : technicalํ•œ impact๊ฐ€ ์•ฝํ•จ
โ€ข ๋ณด์™„์ : ๋ชจ๋ธ๋งˆ๋‹ค intervention layer ๊ฒฝํ–ฅ์ด ์™œ ๋‹ค๋ฅธ์ง€ ๋ถ„์„ ์ถ”๊ฐ€
3.3
์•„์ด๋ฆฌ์Šค์žฅ์ : ์ง๊ด€์  ์•„์ด๋””์–ด ์ข‹์Œ. Motivation๋„ ์ข‹๋‹ค๊ณ  ์ƒ๊ฐํ•จ.
๋‹จ์ : ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ์ด์–ด์ง€๋Š” ํ๋ฆ„์ด ๋‹ค์†Œ ๋œฌ๊ธˆ์—†๊ฒŒ ๋А๊ปด์ง.์ด๊ฑฐ ๊ณ„์‚ฐ ์†๋„๋‚˜ ํšจ์œจ์„ฑ์€ ๊ดœ์ฐฎ๋‚˜?
๋ณด์™„์ : ํ† ํฐ์„ ์–ต์ง€๋กœ ์ƒ์„ฑ์‹œํ‚ค๋Š” ๊ฑด ๋ณ„๋ก ๊ฐ€? ์ถ”๊ฐ€ ๊ณ„์‚ฐ ์—†์ด, ์ค‘๊ฐ„์— ํ•œ๋ฒˆ์”ฉ ๋ผ์–ด๋“œ๋Š” ๋А๋‚Œ์ด๋‚˜.
3.5
ํ•ธ๋“œํฌ๋ฆผโ€ข ์žฅ์ : LRM์˜ ๊ณผ๋„ํ•œ ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ execution ์ œ์™ธํ•˜๊ณ  ์ค„์ž„์œผ๋กœ์จ ์™„ํ™”. ํ•ด๊ฒฐ์ฑ…์˜ ํšจ์œจ๊ณผ ํšจ๊ณผ๊ฐ€ ๋ชจ๋‘ ์ข‹์Œ
โ€ข ๋‹จ์ : reflection/transition ์ค„์ด๋Š” ๊ฒŒ ๋ฌด์กฐ๊ฑด ํšจ๊ณผ์ ์ธ๊ฐ€? ๊ทธ๋Ÿฌ๋ฉด LRM ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•˜๋Š” ๊ฑฐ ์•„๋‹Œ๊ฐ€? reflection/transition์ด ๋งŽ์•„์„œ ํ‹€๋ ธ๋‹ค๋Š” ์ธ๊ณผ๊ด€๊ณ„๊ฐ€ ๋งž๋‚˜?
โ€ข ๋ณด์™„์ : reflection/transition ๋” ํ•„์š”ํ• ๋ฒ•ํ•œ ๊นŒ๋‹ค๋กœ์šด ๋ฒค์น˜๋งˆํฌ ์‹คํ—˜
3.2
3์›” โ€ข ์žฅ์ : ๋ชจ๋ธ ์ถ”๋ก ์˜ ์ตœ์†Œ๋‹จ์œ„๋ฅผ ๋‚˜๋ˆˆ motivation๊ณผ ์‹คํ—˜์˜ ์‹œ๊ฐํ™”๊ฐ€ ์ž˜๋˜์–ด์žˆ์Œ
โ€ข ๋‹จ์ : Inferenceํ•  ๋•Œ sterring ๋ฒกํ„ฐ S๋ฅผ ํ•ญ์ƒ ๋™์ผํ•˜๊ฒŒ ์ ์šฉํ•˜๋Š”๋ฐ, ๋ฌธ์ œ๋งˆ๋‹ค reflection์„ ์œ ์ง€ํ•  ์ง€, ์ œ๊ฑฐํ•  ์ง€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ๊ฐ€ ์žˆ์ง€ ์•Š์„๊นŒ? ์˜ˆ๋ฅผ ๋“ค์–ด ๊ณ„์‚ฐ ์˜ค๋ฅ˜๋ฅผ ๊ฒ€์ฆํ• ๋•Œ๋Š” reflection์ด ํ•„์š”ํ•œ๋ฐ, counting ๋ฌธ์ œ๋Š” reflection๋ณด๋‹ค๋Š” transition์ด ํ›จ์”ฌ ๊ฐ•ํ•˜์ง€ ์•Š๋‚˜?
โ€ข ๋ณด์™„์ : ๋ฌธ์ œ ์œ ํ˜• ์ž๋™ ๋ถ„๋ฅ˜๋ฅผ ํ†ตํ•œ adaptive steering
3.4
์—๋„ˆ์ง€ โ€ข ์žฅ์  : LRM์˜ ๋‹จ๊ณ„?๋ฅผ execution, reflection, transitions ๊ด€์ ์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ๋ฒกํ„ฐ์˜ ์„ฑ์งˆ์„ ์ด์šฉํ•ด reasoning์„ ๋” ๋ณด์™„ํ•˜๋Š” ์—ฐ๊ตฌ. ํŒจํ„ด๋ณ„ ๋ถ„์„๋ถ€ํ„ฐ ๋ฒกํ„ฐ ๊ณ„์‚ฐ๊นŒ์ง€ ํ๋ฆ„์ด ๋งค์šฐ ์ง๊ด€์ ์ด๊ณ , ๋…ผ๋ฆฌ์ ์ด๋ผ๊ณ  ์ƒ๊ฐํ•จ !
โ€ข ์•ฝ์  : space๋ฅผ ์กฐ์ •ํ•˜๋Š”๋ฐ ๋ฒกํ„ฐ ์—ฐ์‚ฐ์œผ๋กœ ์ถฉ๋ถ„ํ• ๊นŒ ..?
โ€ข ๋ณด์™„์  : ๋ฒกํ„ฐ์˜ ๋ฐฉํ–ฅ์„ ์กฐ์ •ํ•จ์œผ๋กœ์จ ์‹คํ—˜๊ฒฐ๊ณผ๋Š” ์ข‹๊ธดํ•˜์ง€๋งŒ, space๋ฅผ ๋” ์ •๋ฐ€ํ•˜๊ฒŒ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์ถฉ๋ถ„ํžˆ ์ œ์‹œ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ.
4.0
ํ™”์ดํŠธ๋…ธ์ด์ฆˆ โ€ข ์žฅ์ : LRM์˜ ๋ฌธ์ œ์ธ redundant verification loop์™€ reasoning detour๋ฅผ ์ž˜ ์งš์–ด์„œ motivation์— ๊ณต๊ฐํ•จ
โ€ข ๋‹จ์ : reflection์„ ์ค„์˜€์„ ๋•Œ ์ƒ๊ธฐ๋Š” ๋ถ€์ž‘์šฉ
โ€ข ๋ณด์™„์ : ๋ฌธ์ œ์˜ ํƒœ์Šคํฌ์— ๋”ฐ๋ฅธ reflection์„ ์–ด๋А์ •๋„ํ• ์ง€ ๋™์ ์œผ๋กœ ์ •ํ•˜๋Š” ์ถ”๊ฐ€ ํ›„์† ์—ฐ๊ตฌ
3.2
ํ”ผ์ฆˆ์น˜์ž โ€ข ๊ฐ•์ : Reasoning process (์ด ๋…ผ๋ฌธ์—์„œ๋Š” thought type)์„ ์ •์˜ํ•ด์„œ '์–ด๋–ค reasoning์ด ๋ฌธ์ œ์ธ๊ฐ€'๋ฅผ ํ•ด์„ํ•˜๋ ค๋Š” ๊ด€์ ์ด ์ข‹์€๋“ฏ. ์ด๊ฑธ ์–ด๋–ป๊ฒŒ ์ •์˜ํ•˜๋Š”์ง€๋„ ํ•˜๋‚˜์˜ ์—ฐ๊ตฌ ๊ธฐ์ค€์ด ๋  ์ˆ˜ ์žˆ์„๋“ฏ ์‹ถ๋‹ค
โ€ข ํ•œ๊ณ„: ๊ทผ๋ฐ ๊ธฐ์กด ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค์„ ์ ‘๋ชฉ์‹œํ‚จ ๋А๋‚Œ์ด ๊ฐ•ํ•˜๊ธด ํ•จ
โ€ข ์ œ์•ˆ์ : ๋ฌธ์ œ ์œ ํ˜•์ด๋‚˜ ๋‚œ์ด๋„์— ๋”ฐ๋ฅธ ์กฐ๊ฑด๋ณ„ ๋ถ„์„์ด ์ถ”๊ฐ€๋กœ ์žˆ์œผ๋ฉด ์ข‹์„๋“ฏ
4.0
์ œ๋กœ์ฝœ๋ผ โ€ข ์žฅ์ : ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด steering vector๋ฅผ hidden state์— ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ๋ถˆํ•„์š”ํ•œ reflection๊ณผ transition์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ๊ฐ„๋‹จํ•˜๋ฉด์„œ๋„ ํšจ๊ณผ์ ์ธ๊ฒƒ ๊ฐ™๋‹ค.
โ€ข ๋‹จ์ : steering vector๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜์œผ๋กœ execution / reflection / transition์„ ๋ถ„๋ฅ˜ํ•˜๋Š”๋ฐ, ์‹ค์ œ๋กœ๋Š” ํ‚ค์›Œ๋“œ ์—†์ด๋„ ํ•ด๋‹น ๋‹จ๊ณ„์— ํ•ด๋‹นํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค.
โ€ข ๋ณด์™„์ : ๋ฌธ์ œ ์œ ํ˜•์ด๋‚˜ ๋‚œ์ด๋„์— ๋”ฐ๋ผ steering ๊ฐ•๋„๋ฅผ ์ž๋™์œผ๋กœ ๋‹ค๋ฅด๊ฒŒ ์ ์šฉํ•˜๋Š” ๋ฐฉ์‹ ์ถ”๊ฐ€
3.5
์ฐฝ๋ฐฑ์นด์ธ„์žฅ์ : ๋ฐฉ๋ฒ•๋ก ์ด training free์—ฌ์„œ, ๊ฐ™์€ motivation์„ ๊ฐ€์ง€๋Š” ๋‹ค๋ฅธ ๋…ผ๋ฌธ๋“ค๊ณผ ์ฐจ๋ณ„์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ. ์ถ”๋ก ์˜ ๋‹จ๊ณ„๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ ๊ฒƒ๋„ ํ›Œ๋ฅญํ•จ
์•ฝ์ : ์–ด๋–ค ๊ทผ๊ฑฐ๋กœ ์ ์ ˆํ•˜๊ฒŒ ๊ธธ์ด๋ฅผ ์กฐ์ •ํ•˜๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์Œ
์ œ์•ˆ์ : ๋ฌธ์ œ์˜ ๋‚œ์ด๋„๋ฅผ confidence๋‚˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ์ธก์ •ํ•˜๊ณ  ๊ทธ๊ฑธ ๊ธฐ๋ฐ˜์œผ๋กœ steeringํ•˜๋ฉด ์ข‹์„ ๋“ฏ!
3.5

TL; DR

๐Ÿ’ก

๋„ˆ๋ฌด ๊ธธ๊ณ  ๋ณต์žกํ•œ reasoning ๊ฒฝํ–ฅ์„ ์™„ํ™”ํ•˜์ž!

โ‡’ reasoning process๋ฅผ ์„ธ๋‹จ๊ณ„๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ , ๊ทธ ์ค‘์— ์–ด๋–ค ๊ฑธ ์ค„์—ฌ์•ผ ํ• ์ง€ ๋ถ„์„ํ•˜์ž

Summary

  • ์—ฐ๊ตฌ์ง„
  • ์ธ์šฉ์ˆ˜: 40

Background & Motivation

  • LLM์˜ ๋›ฐ์–ด๋‚œ reasoning ability
    • Chain-of-Thoughts (CoT) ๋ฅผ ์‹œ์ž‘์œผ๋กœ ์ญ‰์ญ‰ ๋ฐœ์ „ํ•จ
    • o1, R1 ๋“ฑ ์ธ๊ฐ„์˜ ์ธ์ง€ ๋‹จ๊ณ„๋ฅผ ๋ชจ๋ฐฉํ•˜๋Š” large reasoning model์ด ๊ฐœ๋ฐœ๋จ
  • but, LRM์˜ ํ•œ๊ณ„์  ์กด์žฌ
    • memory ๋“ฑ cost issue
    • ์ •๋‹ต์— ํ•„์š”ํ•œ ํ•ต์‹ฌ reasoning์„ ์ด๋ฏธ ์ƒ๋‹นํžˆ ์ด๋ฅธ ์‹œ์ ์— ํ™•๋ณดํ•˜๊ณ ๋„ ๊ทธ ์ดํ›„์— ๋ถˆํ•„์š”ํ•œ thought๋ฅผ ๊ณ„์† ์ƒ์„ฑ

      โ‡’ redundant verification loop๋‚˜ reasoning detour์— ๋น ์งˆ ์ˆ˜ ์žˆ์Œ

      • redundant verification loop๋ž€ ?

        ์ดˆ๊ธฐ solution์ด ์ด๋ฏธ ์ •๋‹ต์„ ๋‚ด๋†จ๋Š”๋ฐ๋„ (์•ฝ 92%์˜ ํ™•๋ฅ !) reasoning process๋ฅผ ์ด์–ด๊ฐ€๋ฉฐ, ๋’ค์ชฝ solution๋“ค์€ ์ƒˆ๋กœ์šด reasoning strategy๋ฅผ ์ฃผ๊ธฐ๋ณด๋‹ค, ์•ž์„  solution์„ ๋‹ค์‹œ ํ™•์ธํ•˜๊ฑฐ๋‚˜ ๋น„์Šทํ•œ ๋ฐฉ์‹์œผ๋กœ ๋ฐ˜๋ณตํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋Š” ๊ฒƒ

        • ์ฐธ๊ณ : Do not think that much for 2+ 3=? on the overthinking of o1-like llms
      • reasoning detour๋ž€?

        ์ดˆ๋ฐ˜ thought๊ฐ€ ๋งž๋Š” ๋ฐฉํ–ฅ์ธ๋ฐ๋„ ๊ทธ thought๋ฅผ ๋๊นŒ์ง€ ๋ฐ€์ง€ ์•Š๊ณ , ๋‹ค๋ฅธ ์ „๋žต์œผ๋กœ ๊ณ„์† ๊ฐˆ์•„ํƒ€๋Š” ํ˜„์ƒ

        • ์ฐธ๊ณ : Thoughts are all over the place: On the underthinking of o1-like llms

    • ํ•ญ์ƒ lengthy reasoning ์ด ํ•„์š”ํ•œ ๊ฑด ์•„๋‹˜

** Main motivation

Can we identify and calibrate the flawed reasoning pathways in current LLMs?

Contributions (What theyโ€™ve revealed)

  • O1/R1-like LLMs์„ ๋ถ„์„ํ•˜์—ฌ execution / reflection / transition ์˜ ์„ธ ๋‹จ๊ณ„๋กœ ๊ตฌ๋ถ„ํ•จ & latent space ์ƒ์—์„œ ๋ถ„์„ํ•จ
    • Recognizing Reasoning Patterns in LLMs
      • model output O์ด โ€œ\n\nโ€ ์œผ๋กœ ๊ตฌ๋ถ„๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Œ โ‡’ ๊ฐ chunk TnT_n๏ปฟ ์œผ๋กœ ํ‘œํ˜„
        • thought sequence O=(T1,T2,...,TN)O = (T_1, T_2, ..., T_N)๏ปฟ
      • ๊ฐ chunk๋ฅผ ์„ธ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜ํ•จ
        • execution : ๋ชจ๋ธ์ด ๋ฌธ์ œ๋ฅผ step-by-step์œผ๋กœ ๋ถ„์„ํ•˜๋Š” ๋‹จ๊ณ„
        • reflection : ๋ชจ๋ธ์ด ์ง„ํ–‰์„ ์ž ๊น ์ค‘๋‹จํ•˜๊ณ , verifyํ•˜๋Š” ๋‹จ๊ณ„ (e.g. ๊ฒ€ํ† ํ•ด๋ณด์ž/ํ™•์ธํ•ด๋ณด์ž)
        • transition : ์ถ”๋ก  ํ๋ฆ„์„ ์ „ํ™˜ํ•˜๊ณ , ๋‹ค๋ฅธ ๊ด€์ ์—์„œ ๋‹ค์‹œ ํ•ด์„ํ•˜๋Š” ๋‹จ๊ณ„
      • ๋ถ„๋ฅ˜ ์˜ˆ์‹œ
      • DeepSeek-R1-DistillQwen-1.5B + Math-500 task์—์„œ์˜ ๋ถ„์„ ๊ฒฐ๊ณผ
        • ๋‚œ์ด๋„๊ฐ€ ๋†’์„์ˆ˜๋ก ์ƒ์„ฑํ•œ ํ† ํฐ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์ง

          โ‡’ ์ธ๊ฐ„์˜ ์‚ฌ๊ณ  ๊ณผ์ •์— ๋น—๋Œ€์–ด ์ƒ๊ฐํ•ด๋ณด๋ฉด ๋‹น์—ฐํ•œ ๊ฒƒ

        • ๋™์ผ ๋‚œ์ด๋„์—์„œ, ์˜ค๋‹ต์˜ ํ† ํฐ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์Œ
          • ์ฆ‰, ๊ณผ๋„ํ•œ ์ถ”๋ก  ๋‹จ๊ณ„๊ฐ€ ์„ฑ๋Šฅ์— ๋ถ€์ •์ ์ธ ์˜ํ–ฅ์„ ๋ผ์นจ
          • ํŠนํžˆ reflection, transition์ด ์ฆ๊ฐ€๋˜์–ด์„œ, ์ „์ฒด output์ด ๊ธธ์–ด์ง€๋Š” ๊ฒฝํ–ฅ์ด ๊ฐ•ํ•จ

            โ‡’ Efficiency & Effectiveness Issue

    • Reasoning pattern ๋ณ„ mechanisms ๋ถ„์„
      • Latent Space ์—์„œ์˜ ํŠน์„ฑ ๋ถ„์„
        • why latent space? ๋‚ด๋ถ€ token์ด ๋„ˆ๋ฌด ๋‹ค์–‘ํ•ด์„œ embedding ๋“ฑ์œผ๋กœ๋ถ€ํ„ฐ ํŠน์„ฑ์„ ์ฐพ๊ธฐ ์–ด๋ ค์›€

          โ‡’ layer-wise representation ์„ ๊ด€์ฐฐํ•ด์•ผ๊ฒ ๋‹ค!

      • how to?
        1. DeepSeek-R1-DistillQwen-1.5B + Math-500 task์—์„œ reasoning ์ˆ˜ํ–‰
        1. 1์˜ output์—์„œ ๊ฐ layer i์—์„œ โ€œ\n\nโ€ ์— ํ•ด๋‹นํ•˜๋Š” representation ์ˆ˜์ง‘
        1. T-distributed Stochastic Neighbor Embedding (t-SNE) ๋กœ 2๋ฅผ 2์ฐจ์›์— ํˆฌ์˜
  • ๋ถ„์„ํ•œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ, reasoning process๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ training-free strategy, SEAL(Steerable rEAsoning caLibration)์„ ์ œ์•ˆ
    ๐Ÿ’ก

    reflection &transition์˜ ๋น„์œจ์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” steering vector๋ฅผ ์ฐพ์•„์„œ,

    ๋ถˆํ•„์š”ํ•œ token ์ƒ์„ฑ์„ ๋ง‰์ž!

    1. extraction of the reasoning steering vector (ํ•ต์‹ฌ ์•„์ด๋””์–ด!)
      1. Collecting Reasoning Processing
        1. Math dataset์˜ 1000๊ฐœ์˜ training data / ๊ฐ target model ์‚ฌ์šฉํ•˜์—ฌ reasoning process ์–ป์Œ
        1. i์„ ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜์œผ๋กœ execution / reflection / transition ์˜ ์„ธ ๋‹จ๊ณ„๋กœ ๊ตฌ๋ถ„
          • reflection or transition ํ‚ค์›Œ๋“œ๊ฐ€ ์—†์œผ๋ฉด execution ์œผ๋กœ
      1. Calculating Steering Vector
        1. ๊ฐ thought j์˜ โ€œ\n\nโ€์— ๋Œ€ํ•œ representation์„ i๋ฒˆ์งธ transformer block์—์„œ ์–ป์Œ : HijH_i^j๏ปฟ
        1. ๊ฐ reasoning category ๋ณ„๋กœ average representations ์–ป์Œ
        1. reasoning steering vector S ๋ฅผ ๊ณ„์‚ฐ

          ์ฆ‰, execution ์˜ ํ‰๊ท ๊ณผ๋Š” ๊ฐ€๊นŒ์›Œ์ง€๊ณ , reflection & transition ์˜ ํ‰๊ท ๊ณผ๋Š” ๋ฉ€์–ด์ง€๋„๋ก

    1. Decoding with Latent Space Intervention
      • ๋งค thought ๋, ์ฆ‰ โ€œ\n\nโ€ token representation์— ๋Œ€ํ•ด ์•„๋ž˜ ์—ฐ์‚ฐ์„ ์ ์šฉ
        H~=H+ฮฑS\tilde{H} = H+\alpha S
        • a(=1): steering strength๋ฅผ ์กฐ์ ˆํ•˜๋Š” hyperparameter
      • ablation์„ ํ†ตํ•ด, ๋ชจ๋ธ๋งˆ๋‹ค ๋‹ค๋ฅธ intervention layer ์ ์šฉ
        • 20 for Deepseek-R1-Distill-Qwen-1.5B & Deepseek-R1-Distill-Qwen-7B
        • 55 for QwQ-32B-Preview
  • ๋‹ค์–‘ํ•œ LLM, benchmark๋ฅผ ์‹คํ—˜์— ํ™œ์šฉํ•˜์—ฌ SEAL์˜ ์šฐ์ˆ˜์„ฑ ์ฆ๋ช…
    • Setting
      • LLM: Deepseek-R1-distill-Qwen-1.5B, Deepseek-R1-distill-Qwen-7B, QwQ32B-Preview
      • benchmark: Math500, GSM8k, LiveCodeBench
        • Math500 Hard: Math500 ์ค‘, difficulty 4 ๋˜๋Š” 5 ๋ฌธ์ œ 500๊ฐœ
      • metrics: Acc, #Tokens
      • baseline: Logit Penalty (training free ๊ธฐ๋ฒ•)

        TL;DR thought-triggering token์˜ logit ๊ฐ’์„ ์ธ์œ„์ ์œผ๋กœ ๋‚ฎ์ถฐ์„œ ๊ทธ ํ† ํฐ์ด ๋‚˜์˜ค๊ธฐ ์–ด๋ ต๊ฒŒ ๋งŒ๋“œ๋Š” inference-time control ๋ฐฉ๋ฒ•

    • Main Results
      • baseline ๋Œ€๋น„ Acc, #Tokens ๊ฐœ์„ 
        • Math500 ๊ฒฐ๊ณผ
        • Math500์—์„œ ์–ป์€ vector โ†’ GSM8k, LiveCodeBench ์ ์šฉ ๊ฒฐ๊ณผ
      • token-space adjustment (Logit Penalty) < Latent space calibration (SEAL)
        • Logit Penalty ๋Š” ์˜คํžˆ๋ ค reflection / transition ์ฆ๊ฐ€๋ฅผ ์œ ๋„ํ•จ
    • Quantitative Evaluation of Efficiency
      • ํ’ˆ์งˆ ๋ฟ ์•„๋‹ˆ๋ผ ์ถ”๋ก  ํšจ์œจ๋„ ๊ฐœ์„ ๋จ
      • hidden state์— steering vector๋ฅผ ๋”ํ•˜๋Š” ์—ฐ์‚ฐ์˜ ์ถ”๊ฐ€ ๊ณ„์‚ฐ๋น„์šฉ์€ ๊ฑฐ์˜ ๋ฌด์‹œ ๊ฐ€๋Šฅํ•จ
      • ์˜คํžˆ๋ ค response length๊ฐ€ ์งง์•„์ง€๋ฉด์„œ ์ „๋ฐ˜์ ์ธ ์‘๋‹ต ์‹œ๊ฐ„ ๊ฐ์†Œ
    • Ablation Study
      • Ablation Study about the Steering Type
        • refection&transition์„ ์–ต์ œํ•˜๋Š” ๊ฒŒ ๊ฐ€์žฅ ์ค‘์š”ํ•จ
      • Ablation Study about the Steering Layer
        • ๊ฐ intervention layer์˜ ์„ ๋ณ„ ๊ธฐ์ค€
        • ์ดˆ๋ฐ˜๋ณด๋‹ค๋Š” ์ค‘ํ›„๋ฐ˜๋ถ€ layer์—์„œ์˜ ์˜ํ–ฅ์ด ํฌ๋‹ค
      • Ablation Study about the Steering Strength
        • alpha ์„ ๋ณ„ ๊ธฐ์ค€

Categories

CoT PROBING research