14 January 2026
Advancing Expert Specialization for Better MoE
NIPS'25
๐กMixture-of-Experts ํ๋ จ ์์คํจ์์๋ expert ๊ฐ routing ํจ์จ์ฑ ์ํ objective term ์์๊ทธ๋ฌ๋ ์ด๋ ๊ฐ expert์ ์ ๋ฌธ์ฑ ํนํ๋ฅผ ๋ฐฉํดํ๋ ๋ถ์์ฉ ์์โ routing ํจ์จ์ฑ ๋ชฉํ๋ฅผ ๋ฐฉํดํ์ง ์์ผ๋ฉด์ expert ์ ๋ฌธํ์ ๋์๋๋ objective๋ฅผ ์ถ๊ฐํ์
07 January 2026
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers
NIPS'25
๐กTransformer ๋ชจ๋ธ ํ๋ จ ์ ์์คํ๋ฝ์ด ์ด๊ธฐ๋จ๊ณ์์ ์ ์ฒด๋๋ค๊ฐ ๊ฐ์๊ธฐ ํฌ๊ฒ ์ผ์ด๋๋ abrupt learning ํ์ ํ๊ตฌ
07 January 2026
Superposition Yields Robust Neural Scaling
NIPS'25
๐กSuperposition์ Scaling law๊ฐ ์๋ํ๊ฒ ํ๋ค!