Blog

27 March 2026

How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability

ICLR'26 Oral

๐Ÿ’กํŠธ๋žœ์Šคํฌ๋จธ๋Š” ํ•™์Šต ์ดˆ๊ธฐ์— 3๊ฐ€์ง€ ๋ฐฉ์‹์˜ ํ†ต๊ณ„ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ค‘์น˜์— ์ง์ ‘ ๋ฐ˜์˜ํ•˜๋ฉฐ, ์ด๋“ค์˜ ์กฐํ•ฉ๋งŒ์œผ๋กœ ์˜๋ฏธ์  ๊ด€๊ณ„์™€ ์–ดํ…์…˜์ด ํ˜•์„ฑ๋จ

27 March 2026

Hallucination Begins Where Saliency Drops

ICLR'26 Oral

๐Ÿ’กHallucination์„ ์ค„์ด๊ธฐ ์œ„ํ•ด Attention map๋ง๊ณ ๋„ Saliency map์—์„œ gradient๊ฐ€ ์ค„์–ด๋“œ๋Š” ๋ถ€๋ถ„์„ ํ™•์ธํ•ด์•ผ ํ•œ๋‹ค!

27 March 2026

FRESH IN MEMORY: TRAINING-ORDER RECENCY IS LIN-EARLY ENCODED IN LANGUAGE MODEL ACTIVATIONS

ICLR'26 Poster

๐Ÿ’ก์–ธ์–ด ๋ชจ๋ธ์€ โ€œ๋ฌด์—‡โ€ ์„ ๋ฐฐ์› ๋Š”์ง€์™€ โ€œ์–ธ์ œโ€ ๋ฐฐ์› ๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ๊ณ ์žˆ๋‹ค.โ‡’ ๋‹ค์–‘ํ•œ ํ†ต์ œ ์‹คํ—˜์„ ํ†ตํ•ด ๊ฒ€์ฆํ•ด๋ณด์ž ! !