Long Context Reasoning

์ด๋‘ํ˜ธ
26 March 2026

LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts

ICLR'26 Oral

๐Ÿ’กshort-context(16K) RL ํ•™์Šต๋งŒ์œผ๋กœ long-context(128K) ์ถ”๋ก ์„ ์ž˜ํ•˜๊ฒŒ ํ•˜์ž.์–ด๋–ป๊ฒŒ?โ‡’ UUID ์ฒด์ธ์œผ๋กœ ์งˆ๋ฌธ์„ ์ˆจ๊ธด ๊ณ ๋‚œ์ด๋„ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ(KeyChain)๋กœ RL ํ•™์Šตํ•˜๋ฉด, planโ€“retrieveโ€“reasonโ€“recheck ์‚ฌ๊ณ  ํŒจํ„ด์ด ๋ฐœ์ƒํ•˜์—ฌ ๋†’์€ ์žฅ๋ฌธ ์ถ”๋ก  ์„ฑ๋Šฅ์„ 7B/14B์˜ ์†Œํ˜• ๋ชจ๋ธ๋กœ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.