14 January 2026

Let LRMs Break Free from Overthinking via Self-Braking Tuning

๐Ÿ’ก๋ชจ๋ธ ๋‚ด์žฌ์ ์œผ๋กœ ๋ถˆํ•„์š”ํ•œ ์ถ”๋ก (์˜ค๋ฒ„ ๋ตํ‚น)์„ ๋ง‰์ž!

Let LRMs Break Free from Overthinking via Self-Braking Tuning

Review

๋‹‰๋„ค์ž„ ํ•œ์ค„ํ‰๋ณ„์  (0/5)
์ฐฐ๋‚˜์ตœ๊ทผ ๋…ผ๋ฌธ๋“ค์„ ๋ณด๋ฉด์„œ ์ƒ๊ฐํ•˜๋Š” ๊ฑด, ์•„์ด๋””์–ด๋ฅผ ์ผ๋‹จ ์‹คํ—˜ํ•ด๋ณด๋Š” ๊ฒŒ ์ข‹๋‹ค ์ธ ๊ฒƒ ๊ฐ™์Œ. ๊ฐœ์ธ์ ์œผ๋กœ ๋‚˜๋„ ์˜ค๋ฒ„๋ตํ‚น์„ ๋„ˆ๋ฌด ๋งŽ์ด ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š”๋ฐ, ๋Œ์•„๋ณด๊ฒŒ ๋œ ๋…ผ๋ฌธ์ด์—ˆ์Œ. ์ด๋ฒˆ์ฃผ ๋…ผ๋ฌธ๋“ค์ด ์„ฑ์ฐฐ์„ ํ•˜๊ฒŒ ํ•ด์„œ ์ฃผ์ €๋ฆฌ์ฃผ์ €๋ฆฌ ์“ฐ๊ฒŒ ๋งŒ๋“ค์—ˆ๋Š”๋ฐ, ๋‹ค ์จ๋†“์€ ๊ฒฐ๊ณผ๋ฌผ์„ ๋ณด๋ฉด ๋ญ”์†Œ๋ฆฐ์ง€ ๋ชจ๋ฅด๊ฒ ์Œ. LRM๋„ ๋น„์Šทํ•œ ํ˜„์ƒ์„ ๊ฒช์„ ๊ฒƒ ๊ฐ™๊ณ , ํ•ด๊ฒฐ์ด ๊ผญ ํ•„์š”ํ•œ ๋ฌธ์ œ๋ผ๊ณ  ์ƒ๊ฐํ•จ.4.2
์™€์‚ฌ๋น„๊ฝƒ๊ฒŒ๋ž‘์ด ๋…ผ๋ฌธ ๋ฟ๋งŒ์ด ์•„๋‹ˆ๋ผ LRM ๊ด€๋ จ ์—ฐ๊ตฌ๋“ค์„ ๋ณด๋ฉด ์ถ”๋ก ์„ ๋” ๊ธธ๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐฉํ–ฅ๋ณด๋‹ค๋Š” ์˜คํžˆ๋ ค ์–ธ์ œ ๋ฉˆ์ถœ ๊ฒƒ์ธ๊ฐ€์— ์ดˆ์ ์„ ๋‘๋Š” ์—ฐ๊ตฌ๋“ค์ด ์ ์  ๋Š˜์–ด๋‚˜๋Š” ๋“ฏ3.9
๋ฉ”๊ฐ€์ปคํ”ผoverthinking์œผ๋กœ ์ธํ•ด์„œ ์‹ค์ œ๋กœ ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๊ธฐ ์˜ค๋ฒ„๋ตํ‚น์„ ์‹๋ณ„ํ•ด๋‚ด๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•จ๊ณผ ๋™์‹œ์— ๋ง‰๋Š” ๋ฒ•๊นŒ์ง€ ์ œ์‹œํ–ˆ๋‹ค๋Š” ์ ์—์„œ ํ˜„ ์‹œ์ ์— ๊ผญ ํ•„์š”ํ•œ ๋…ผ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•จ4
์š”๋ฆฌ๊ดด๋ฌผs1 ๋…ผ๋ฌธ์ด๋ž‘ ์ง€ํ–ฅํ•˜๋Š” ๋ฐฉํ–ฅ์ด ์œ ์‚ฌํ•ด๋ณด์ž„. ์œ ์—ฐ์„ฑ ์ธก๋ฉด์—์„œ๋Š” ์‚ฌ์šฉ์ž ๊ฐœ์ž…์ด ๊ฐ€๋Šฅํ•œ ์ด ๋…ผ๋ฌธ์ด ๋” ํšจ๊ณผ์ ์ผ๊ฑฐ๊ฐ™๊ณ , ๋น ๋ฅด๊ณ  ๋‹จ์ˆœํ•œ ๋ฐฉ์‹์€ s1 ๋…ผ๋ฌธ์ด ํšจ๊ณผ์ ์ผ ๊ฒƒ์œผ๋กœ ๋ณด์ž„. ๊ทธ๋ž˜๋„ ์ ์‘์ ์œผ๋กœ ๋ธŒ๋ ˆ์ดํ‚น์„ ๊ฑด๋‹ค๋Š” ๊ด€์ ์—์„œ ํ™•์‹คํžˆ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š”๊ฒŒ ์ ํ•ฉํ•ด๋ณด์ธ๋‹ค.4.1
์ƒˆ์šฐ๊นกํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉ์€ ํ•˜๋Š”๋ฐ ์†์‹ค์— ๊ธฐ์—ฌ ์•ˆํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ• ๋”ฐ๋ผ์„œ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ๊ฒ ๋‹ค. ์‹คํ—˜ ํŒŒํŠธ์—์„œ ์ „์šฉ๋ชจ๋ธ์€ ์ด๋ฏธ ์ถ”๋ก  ํšจ์œจ์„ฑ๋„ ์žก์•˜๋‹ค๋Š”๋ฐ ๊ทธ๋Ÿฌ๋ฉด ์•ž์œผ๋กœ LRM์ด ์ถ”๋ก ์„ ์–ด๋–ป๊ฒŒ ๋” ๋ฐœ์ „์‹œํ‚ฌ์ง€ ๊ถ๊ธˆํ•ด์ง4.2
์•ˆ์„ฑ์žฌ๊ต‰์žฅํžˆ trendy ํ•œ LRM์˜ ์‹คํŒจ ์‚ฌ๋ก€๋ฅผ ์žก๊ณ  ํ•œ ๊ฒƒ, scoring ์ง€ํ‘œ๋ฅผ ๋งŒ๋“  ๊ฒƒ์€ ์ข‹์œผ๋‚˜, task์—์„œ์˜ ์„ฑ๋Šฅ ํ•˜๋ฝ์€ ๋ฏผ๊ฐํ•œ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๋ณด๋ฅ˜ ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.3
์Šคํƒ€๋ฒ…์Šค๋ถˆํ•„์š”ํ•œ ์ถ”๋ก ๊ณผ Overthinking์„ ๋ง‰๋Š”๋‹ค๋Š” ๊ฒƒ์— ์žˆ์–ด ์•ž์„  ๋…ผ๋ฌธ๊ณผ ์œ ์‚ฌํ•œ ์ ์ด ์žˆ์Œ. ๋ฌด์กฐ๊ฑด ๋งŽ์ด ๋Œ๋ฆฌ๋Š”๊ฒŒ ์ข‹์€๊ฒŒ ์•„๋‹ˆ๊ณ  ๋ฉˆ์ถœ ์‹œ์ ์„ ์ž˜ ์ •ํ•˜๋Š”๊ฒŒ ์ค‘์š”ํ•จ.4.1
๊ณ ๊ตฌ๋งˆ๋ง›๋„๋ฆฌR1 ๋ชจ๋ธ์—์„œ ํ”ํžˆ ๋ณผ ์ˆ˜ ์žˆ๋Š” overthinking ,, ์ง€ ํ˜ผ์ž fallback์— ๋น ์ง€๋Š” ๊ฒŒ ๋ฐ”๋ณด๊ฐ™๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ, ์ด๋ฅผ ๋ŒํŒŒํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ/๋ฐฉ๋ฒ•์„ ๊ทผ๋ณธ์ ์œผ๋กœ(๋ชจ๋ธ ๋‚ด๋ถ€์— ๊ธฐ์ธํ•จ) ์ ‘๊ทผํ•œ ๋…ผ๋ฌธ! ๊ฐœ์ธ์ ์œผ๋กœ threshold๊ฐ€ ์ƒ๊ฐ๋ณด๋‹ค ๋„๋„ํ•ด์„œ ๋†€๋ž๊ณ , ์‹ค์ œ๋กœ ๋งŽ์ด ์จ๋จน์„ ์ˆ˜ ์žˆ์„ ๊ฑฐ ๊ฐ™์•„์„œ ์žฌ๋ฐŒ๊ฒŒ ์ฝ์—ˆ๋‹ค!3.7

TL; DR

๐Ÿ’ก

๋ชจ๋ธ ๋‚ด์žฌ์ ์œผ๋กœ ๋ถˆํ•„์š”ํ•œ ์ถ”๋ก (์˜ค๋ฒ„ ๋ตํ‚น)์„ ๋ง‰์ž!

Summary

Motivation

  • Large Reasoning Model(LRM)์€ ์ƒ๊ฐ์„ ๊ธธ๊ฒŒ ํ•˜๋ฉด์„œ ์ถ”๋ก ์„ ์ž˜ํ•˜๊ฒŒ ๋์ง€๋งŒ, ์ด๋ฏธ ์ •๋‹ต์ด ๋‚˜์™”๋Š”๋ฐ๋„ ๊ณ„์† ๊ฒ€ํ† ๋ฅผ ํ•˜๊ฑฐ๋‚˜ ๋ฐ˜๋ณต์„ ์ˆ˜ํ–‰ํ•˜๋Š” Overthinkingํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Œ
    • ์ด๋Ÿฌํ•œ ์ถ”๋ก ์€ ์—„์ฒญ๋‚˜๊ฒŒ ๋งŽ์€ ํ† ํฐ์„ ๋‚ญ๋น„ํ•ด, ๊ณ„์‚ฐ๋Ÿ‰, latency๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ณ , ์ด๋ฏธ ๋‚˜์˜จ ์ •๋‹ต์„ ๋ชจํ˜ธํ•˜๊ฒŒ ๋งŒ๋“ฆ
  • ๊ธฐ์กด ํ•ด๊ฒฐ์ฑ…๋“ค์€ ํ† ํฐ ์ˆ˜ ์ œํ•œ์ด๋‚˜ ์™ธ๋ถ€ ๊ฒ€์ฆ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•ด ์™ธ๋ถ€์ ์œผ๋กœ ๊ฐœ์ž…ํ•˜๋ ค๊ณ  ํ–ˆ์Œ.
    • ์‚ฌ๋žŒ์€ ํ™•์‹ ์ด ๋“ค๋ฉด ์ƒ๊ฐ์„ ๋ฉˆ์ถ”๋‹ˆ๊นŒ, ๋ชจ๋ธ๋„ ๋‚ด์žฌ์ ์œผ๋กœ ๋ถˆํ•„์š”ํ•œ ์ถ”๋ก ์„ ๊ฐ์ง€ํ•˜๊ณ  ๋ฉˆ์ถ”๊ฒŒ ํ•˜์ž!

Contribution

  • LRM์ด ์Šค์Šค๋กœ ์ถ”๋ก ๊ธธ์ด๋ฅผ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” tuning framework์ธ Self-Braking Tunig ์ œ์•ˆ
    • ์ถ”๋ก  ํšจ์œจ์„ฑ, ์‘๋‹ต ํ’ˆ์งˆ ํ–ฅ์ƒ
  • Overthinking ํŒจํ„ด์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก  ๋ฐ ๋ฐ์ดํ„ฐ ๊ตฌ์ถ• ์ „๋žต ์ œ์•ˆ
    • ๊ตฌ์ถ•ํ•œ ๋ฐ์ดํ„ฐ์…‹์€ overthinking ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์— ํŠนํ™”๋จ

Methods

R1-like LRM์˜ Reasoning trajectory ๋ถ„์„

  • DeepSeek-R1๊ณผ ๊ทธ๊ฑธ distillationํ•œ ๋ชจ๋ธ๋“ค์€ ๋น„์Šทํ•œ ํŒจํ„ด์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ
    • ๋ฌธ์ œ๋ฅผ ํ’€ ๋•Œ ํ’€์ด๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ์ƒ์„ฑํ•˜๋ ค๊ณ  ํ•จ
  • ์ €์ž๋“ค์€ ํ’€์ด๋ฅผ Foundation Solution, Evolution Solution์œผ๋กœ ๋‚˜๋ˆ”
    • Foundation Solution: ๋ฌธ์ œ ํ’€์ด์˜ ์ดˆ๋ฐ˜ ๋ถ€๋ถ„, ์ถ”๋ก  ๊ณผ์ •์˜ ๊ธฐ์ดˆ๋ฅผ ํ˜•์„ฑํ•จ
    • Evolution Solution: ๋ฌธ์ œ ํ’€์ด์˜ ํ›„๋ฐ˜ ๋ถ€๋ถ„, foundation solution์„ ๊ณ ์น˜๊ฑฐ๋‚˜ ๋‹ค๋“ฌ์Œ
      • ์—ฌ๊ธฐ์„œ overthinking์ด ์ž์ฃผ ๋‚˜์˜ด

Overthinking ์‹๋ณ„

  • Overthinking์„ ๊ฐ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋‘๊ฐ€์ง€ ์ง€ํ‘œ๋ฅผ ์ œ์•ˆํ•จ
    • ์ถ”๋ก  ํšจ์œจ์„ฑ ๋น„์œจ ฮทs=FSTS\eta_{s} = \frac{FS}{TS}๏ปฟ
      • FS: ์ฒ˜์Œ์œผ๋กœ ์ •์— ๋„๋‹ฌํ•˜๋Š”๋ฐ ๊ฑธ๋ฆฐ step ์ˆ˜
      • TS: ์ „์ฒด setp ์ˆ˜
      • ฮทs\eta_s๏ปฟ๊ฐ€ 1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ํšจ์œจ์ ์ด๋ฏ€๋กœ overthinking ํ•˜์ง€ ์•Š์€ ๊ฒƒ!
    • Overthinking ๋งˆ์ปค ๋น„์œจ ฮบt=1TTโˆ‘i=1TTI[wiโˆˆM]\kappa_{t} = \frac{1}{TT}\sum_{i=1}^{TT}\mathbb{I}[w_{i}\in\mathcal{M}] ๏ปฟ
      • ์–ธ์–ด์  ํŒจํ„ด์„ ํฌ๋ฝํ•˜๊ธฐ ์œ„ํ•ด, overthinking๊ณผ ๊ด€๋ จ์žˆ๋Š” ๋งˆ์ปค๋“ค์„ ๋งŒ๋“ฌ
      • ๋งˆ์ปค ์ง‘ํ•ฉ
      • ์ „์ฒด ํ† ํฐ ์ค‘์—์„œ ๋งˆ์ปค๊ฐ€ ๋งŽ์„ ์ˆ˜๋ก ฮบt\kappa_t๏ปฟ๋Š” 1์— ๊ฐ€๊นŒ์›Œ์ง€๊ณ  overthinking์„ ํ•œ ๊ฒƒ!
  • ์œ„ ๋‘ ์ง€ํ‘œ๋กœ overthink score๋ฅผ ์ œ์•ˆ
    • Overthinkย Score=ฮฒร—ฮบt+(1โˆ’ฮฒ)ร—(1โˆ’ฮทs)\text{Overthink Score} = \beta \times \kappa_{t} + (1-\beta) \times (1-\eta_{s}) ๏ปฟ
    • ฮฒ\beta๏ปฟ๋Š” 0.1 ๋กœ ์‚ฌ์šฉ (์–ดํœ˜์  ๋‹จ์„œ๋ณด๋‹ค ์ •๋‹ต ๋„์ถœ ์œ„์น˜๋ฅผ ์ค‘์š”ํ•˜๊ฒŒ ๋ด„)
    • ฮทs\eta_s๏ปฟ๋Š” 1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก overthink ์•ˆํ•œ๊ฑฐ๊ณ  ฮบt\kappa_t๏ปฟ๋Š” 1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก overthink ํ•œ ๊ฒƒ์ด๋ผ์„œ ์‹์—์„œ๋Š” 1โˆ’ฮทs-\eta_s๏ปฟ ์‚ฌ์šฉ
    • ์ฆ‰, score๊ฐ€ ๋†’์„ ์ˆ˜๋ก overthink ํ•œ ๊ฒƒ์ž„

์ ์‘ํ˜• ์ถ”๋ก  ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•

  • ์ถ”๋ก  ๋Šฅ๋ ฅ์€ ๋ณด์กดํ•˜๋ฉด์„œ, overthinking์„ ์ข…๋ฃŒํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šตํ•˜์ž!
  • ์ ์‘ํ˜• ์ถ”๋ก ์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฐ์ดํ„ฐ์…‹์„ 2๊ฐ€์ง€ ์ „๋žต์œผ๋กœ ๊ตฌ์ถ•ํ•จ
    • Self-Braking Tuning Exact (SBT-E)
      • Ovethinking trajectory์—์„œ Foundation solution+Evolution solution ํ•˜๋‚˜์”ฉ๋งŒ ๊ฐ€์ ธ์˜ค๊ณ , ๋‚˜๋จธ์ง€ ์ถ”๋ก ์€ masking
        • masking๋œ ํ† ํฐ๋“ค์€ loss์— ํฌํ•จ ์•ˆ๋จ โ‡’ ์ค‘๋ณต๋˜๋Š” ์ถ”๊ฐ€ ์ถ”๋ก ์„ ๋ง‰์Œ
          • 2๋ฒˆ์งธ Evolustion solution์˜ ์ดˆ๋ฐ˜ ๋ถ€๋ถ„๋งŒ ๋งˆ์Šคํ‚น
        • ์ผ๊ด€์ ์ธ ์ถ”๋ก ์„ ๊ฐ–๋„๋ก ํ•จ
    • Self-Braking Tuning Dynamic (SBT-D)
      • ์ ์‘์ ์œผ๋กœ ๊ธธ์ด ์กฐ์ ˆ
      • Foundation solution์—์„œ ์‹œ์ž‘ํ•ด์„œ step ๋งˆ๋‹ค overthink score ๊ณ„์‚ฐ
      • overthink score๊ฐ€ ฯ„1\tau_1๏ปฟ์ผ๋•Œ๊นŒ์ง€ step ์ถ”๊ฐ€
      • ฯ„1\tau_1๏ปฟ<score<ฯ„2\tau_2๏ปฟ ์ธ step๋“ค์€ masking
      • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ masking๋œ ํ† ํฐ๋“ค์€ loss์— ํฌํ•จ ์•ˆ๋จ

์ž๊ธฐ ์กฐ์ ˆ ๋ธŒ๋ ˆ์ดํ‚น ์ „๋žต

  • ์ƒ์„ฑ๋œ ์ถ”๋ก ์— ๋Œ€ํ•œ self-awareness๋ฅผ ๊ธฐ๋ฅด์ž
    • ์œ„์—์„œ ๋งŒ๋“  ๋ฐ์ดํ„ฐ์…‹ ๊ฐ•ํ™”ํ•˜๋Š” ๊ฒƒ์ž„!
  • SBT-E, D์—์„œ maskingํ•œ ํ† ํฐ๋“ค์€ loss์—๋Š” ํฌํ•จ์ด ์•ˆ๋˜์ง€๋งŒ ํ•™์Šต ๋ฐ์ดํ„ฐ์ƒ์—๋Š” ํฌํ•จ๋จ
    • ์ค‘๋ณต๋œ ์ถ”๋ก ์ด ๋ญ”์ง€๋Š” ์•Œ์•„์•ผ ํ•จ!
  • ๊ทธ๋ฆฌ๊ณ  training solution๊ณผ masked solution ์‚ฌ์ด์— ์ž์—ฐ์–ด์ ์ธ ๊ฐ€์ด๋“œ๋ฅผ ์ œ์‹œ
    • e.g. Wait, I've gotten the same answer multiple times, time to end the thinking.
    • ๋ช…์‹œ์ ์ธ ํžŒํŠธ๋ฅผ ์ค˜์„œ ๋ฉˆ์ถฐ์•ผ ํ•  ๋•Œ๋ฅผ ์•Œ๊ฒŒ ํ•จ

Experiments

  • Experimental setting
    • LLM: Qwen2.5-Math-1.5B/7B-Instruct, Llama-3.2-1B, Llama-3.1-8B-Instruct
    • Benchmark: AIME 24, AIME 25, AMC23, MATH500, GSM8K
      • AIME๋Š” ๋ฏธ๊ตญ ์ˆ˜ํ•™ ๊ฒฝ์‹œ๋Œ€ํšŒ, AMC๋Š” ๊ทธ ์ „๋‹จ๊ณ„ (KMO, KMC ๋А๋‚Œ)
  • Main Results
    • ์ •ํ™•๋„๋Š” ์œ ์ง€, ํ† ํฐ ์†Œ๋น„ ๊ฐ์†Œ
    • ๋ฒ”์šฉ ๋ชจ๋ธ(Llama)์€ ํด์ˆ˜๋ก ๊ณผ๋„ํ•œ ์ƒ๊ฐ์„ ๋งŽ์ด ํ•ด์„œ ์ด๋“
    • ์ „๋ฌธํ™”๋œ ๋ชจ๋ธ(Qwen-Math)์€ ํด์ˆ˜๋ก ์ด๋ฏธ ํšจ์œจ์ ์ด๋ผ์„œ ํฐ ์ด๋“์€ ์•ˆ๋จ
  • Overthinking thresholds Results
    • Overthink์— ๋Œ€ํ•œ ๊ธฐ์ค€์„ ๋„๋„ํ•˜๊ฒŒ(๋‚ฎ๊ฒŒ) ํ•ด๋„ ์œ ํšจํ•จ
  • Preserved reasoning and redundancy masking trade-off
    • ์–ด๋””๊นŒ์ง€ ํ•™์Šตํ•˜๊ณ , ์–ด๋””๊นŒ์ง€ ๋งˆ์Šคํ‚นํ• ๊ฑด์ง€
    • solution 2๊ฐœ์™€ ๋ช‡๊ฐœ ๋ฌธ์žฅ ๋งˆ์Šคํ‚น์ด ์ตœ์ 
  • Masked redundant thinking ablation
    • ๋งˆ์Šคํ‚น ์œ ๋ฌด ์ฐจ์ด
    • ๋งˆ์Šคํ‚น(์ค‘๋ณต๋œ ์ƒ๊ฐ)์ด ์—†์œผ๋ฉด ๋ฉˆ์ถœ ์ƒ๊ฐ์„ ์•ˆํ•จ
  • Hyperparameter ฮฒ\beta๏ปฟ experiment
  • Step-level vs. token-level overthinking detection
  • Natural language guidance vs. alternative approaches

Categories

research