Blog

14 January 2026

Advancing Expert Specialization for Better MoE

NIPS'25

๐Ÿ’กMixture-of-Experts ํ›ˆ๋ จ ์†์‹คํ•จ์ˆ˜์—๋Š” expert ๊ฐ„ routing ํšจ์œจ์„ฑ ์œ„ํ•œ objective term ์žˆ์Œ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Š” ๊ฐ expert์˜ ์ „๋ฌธ์„ฑ ํŠนํ™”๋ฅผ ๋ฐฉํ•ดํ•˜๋Š” ๋ถ€์ž‘์šฉ ์žˆ์Œโ‡’ routing ํšจ์œจ์„ฑ ๋ชฉํ‘œ๋ฅผ ๋ฐฉํ•ดํ•˜์ง€ ์•Š์œผ๋ฉด์„œ expert ์ „๋ฌธํ™”์— ๋„์›€๋˜๋Š” objective๋ฅผ ์ถ”๊ฐ€ํ•˜์ž

07 January 2026

What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers

NIPS'25

๐Ÿ’กTransformer ๋ชจ๋ธ ํ›ˆ๋ จ ์‹œ ์†์‹คํ•˜๋ฝ์ด ์ดˆ๊ธฐ๋‹จ๊ณ„์—์„œ ์ •์ฒด๋˜๋‹ค๊ฐ€ ๊ฐ‘์ž๊ธฐ ํฌ๊ฒŒ ์ผ์–ด๋‚˜๋Š” abrupt learning ํ˜„์ƒ ํƒ๊ตฌ

07 January 2026

Superposition Yields Robust Neural Scaling

NIPS'25

๐Ÿ’กSuperposition์€ Scaling law๊ฐ€ ์ž‘๋™ํ•˜๊ฒŒ ํ•œ๋‹ค!