Mathematical Reasoning

26 March 2026

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

COLM'25

๐Ÿ’กMathematical Reasoning Task ๋ฅผ ํ•  ๋•Œ, RL์„ ๊ฐ„์ ‘์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜์—ฌ ๊ฐ„๋‹จํ•˜๊ฒŒ ํ’€์–ด๋ณด์ž.(= ๊ฐ•ํ™”ํ•™์Šต ํ˜•ํƒœ๋กœ ์ˆ˜ํ•™๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ’€์–ด๋ณด์ž !)