PROBING

27 March 2026

Shared Global and Local Geometry of Language Model Embeddings

COLM'25

๐Ÿ’ก๊ฐ™์€ ๊ณ„์—ด์˜ ์–ธ์–ด ๋ชจ๋ธ๋“ค์€ ์ฐจ์›์ด ๋‹ฌ๋ผ๋„ token embedding์˜ ๊ตฌ์กฐ๊ฐ€ ๊ต‰์žฅํžˆ ๋น„์Šทํ•˜๋‹ค! ๊ทธ๋ž˜์„œ, ํ•œ ๋ชจ๋ธ์—์„œ ๋งŒ๋“ค์–ด๋‚ธ steering vector๋ฅผ ๋‹ค๋ฅธ ๋ชจ๋ธ์—์„œ ์„ ํ˜•๋ณ€ํ™˜๋งŒ์œผ๋กœ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค!์˜ˆ: 1B, 3B์—์„œ helpfulness๋ฅผ ์˜ฌ๋ฆฌ๋„๋ก ํ•˜๋Š” vector๋ฅผ ์ฐพ๊ณ  ๋‚˜์„œ, 8B๋กœ ๊ทธ๋Œ€๋กœ ์˜ฎ๊ฒจ์„œ ์“ธ ์ˆ˜ ์žˆ์Œ!

26 March 2026

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

COLM'25

๐Ÿ’ก๋„ˆ๋ฌด ๊ธธ๊ณ  ๋ณต์žกํ•œ reasoning ๊ฒฝํ–ฅ์„ ์™„ํ™”ํ•˜์ž!โ‡’ reasoning process๋ฅผ ์„ธ๋‹จ๊ณ„๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ , ๊ทธ ์ค‘์— ์–ด๋–ค ๊ฑธ ์ค„์—ฌ์•ผ ํ• ์ง€ ๋ถ„์„ํ•˜์ž

19 March 2026

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

COLM'25

๐Ÿ’กPost-training ํ›„ ๋ชจ๋ธ ๋‚ด๋ถ€ ์ง€์‹, ์ง„์‹ค์„ฑ, ์•ˆ์ „์„ฑ, ํ™•์‹ ์„ฑ์˜ ๋ณ€ํ™”๋ฅผ ๊ธฐ๊ณ„์ ์œผ๋กœ ๋ถ„์„!