CoT

26 March 2026

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

COLM'25

๐Ÿ’ก๋„ˆ๋ฌด ๊ธธ๊ณ  ๋ณต์žกํ•œ reasoning ๊ฒฝํ–ฅ์„ ์™„ํ™”ํ•˜์ž!โ‡’ reasoning process๋ฅผ ์„ธ๋‹จ๊ณ„๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ , ๊ทธ ์ค‘์— ์–ด๋–ค ๊ฑธ ์ค„์—ฌ์•ผ ํ• ์ง€ ๋ถ„์„ํ•˜์ž

26 March 2026

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

COLM'25

๐Ÿ’กMathematical Reasoning Task ๋ฅผ ํ•  ๋•Œ, RL์„ ๊ฐ„์ ‘์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜์—ฌ ๊ฐ„๋‹จํ•˜๊ฒŒ ํ’€์–ด๋ณด์ž.(= ๊ฐ•ํ™”ํ•™์Šต ํ˜•ํƒœ๋กœ ์ˆ˜ํ•™๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ’€์–ด๋ณด์ž !)