27 March 2026

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

๐Ÿ’ก์„œ๋กœ ๋‹ค๋ฅธ LLM๋“ค์„ ๋น„๊ตํ–ˆ์„ ๋•Œ, ๋น„์Šทํ•œ ์ƒ๋Œ€ depth์˜ layer๋“ค๋ผ๋ฆฌ activation geometry๊ฐ€ ์œ ์‚ฌํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚จ์ฆ‰, LLM๋งˆ๋‹ค layer representation์€ ๋ณ€ํ•˜์ง€๋งŒ, ๊ทธ ๋ณ€ํ™”์˜ progression์€ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋„˜์–ด ์–ด๋А ์ •๋„ ๊ณต์œ ๋จ

์ตœ๋ฏผ์˜
์ตœ๋ฏผ์˜

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

Review

๋‹‰๋„ค์ž„ ํ•œ์ค„ํ‰๋ณ„์  (0/5)
๋ˆˆ๋ฌผ โ€ข ๊ฐ•์  : ๋ชจ๋ธ์ด ๋‹ค๋ฅด๋‹ค๊ณ  ํ•ด์„œ, activation๊นŒ์ง€ ๋‹ค๋ฅธ๊ฑด ์•„๋‹ˆ๋‹ค๋ฅผ ๋ณด์—ฌ์ค€ ์—ฐ๊ตฌ.
โ€ข ์•ฝ์  : ๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ์œผ๋ก  LLM์€ Pretrain-model์ด๋‹ˆ Pre-training์‹œ์— ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œ์ผฐ์„ํ…๋ฐ,
๊ทธ์— ๋”ฐ๋ผ ๋น„์Šทํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•™์Šต๋˜์—ˆ์ง€ ์•Š์•˜์„๊นŒ? ๊ทธ๋ž˜์„œ ๋‹ค๋ฅธ ๋ชจ๋ธ์ด๋ผ๋„ ์ƒ๋Œ€์ ์ธ layer์—์„œ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜์˜จ๊ฒŒ ์•„๋‹Œ๊ฐ€?
์‹คํ—˜์— ์“ฐ์ธ ๋ชจ๋ธ๋“ค์˜ ์‚ฌ์ „์ง€์‹ ํŠน์„ฑ(๋ฐ์ดํ„ฐ ๋ถ„ํฌ)์„ ๊ณ ๋ คํ•ด์•ผ ํ•˜์ง€ ์•Š์„๊นŒ? ์‹ถ์Œ. (๋ชจ๋ธ์ด ๋‹ค๋ฅด๋‹ˆ pretrain ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅด๋‹ค๋ฅผ ์ „์ œ๋กœ ํ•˜๋Š”๊ฑฐ ๊ฐ™์€๋ฐ ์–ด๋–ป๊ฒŒ ์•Œ์ง€? ๋ช…ํ™•ํ•˜์ง€ ์•Š์€ ๊ฒƒ ๊ฐ™๋‹ค)
โ€ข ๋ณด์™„์  : ํ™•์‹คํ•˜๊ฒŒ ๊ตฌ์กฐ์ ์ธ ํŠน์ง•์ธ์ง€ ํ†ต์ œ ์‹คํ—˜์ด ํ•„์š”ํ•จ. (๋ชจ๋“  ๋ชจ๋ธ ์ž…์žฅ์—์„œ ์™„์ „ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์ธ ๊ฒƒ์„ ๊ฐ€์ง€๊ณ  ์‹คํ—˜์„ ํ•œ๋‹ค๋“ ์ง€.........?)
3.4
ํ”ผ๋•€ โ€ข ๊ฐ•์ : ์ •๋ง ๋งŽ์€ ๋ชจ๋ธ๊ณผ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•จ์œผ๋กœ์จ ์ผ๋ฐ˜ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฑธ ๊ฒ€์ฆํ•จ
โ€ข ์•ฝ์  & ๋ณด์™„์ : ์–ด๋–ค ์ ์œผ๋กœ ์ธํ•ด ์œ ์‚ฌํ•จ์ด ๋‚˜ํƒ€๋‚˜๋Š”๊ฑด์ง€ ablation study ๊ฐ€ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์Œ
3
thumbs-up โ€ข ์žฅ: ๋ชจ๋ธ์˜ ์ข…๋ฅ˜๋ณด๋‹ค๋Š” layer์˜ ๊นŠ์ด์— ๋”ฐ๋ผ ๋น„์Šทํ•œ activation์„ ๋ณด์ž„์„ ๋ฐํž˜. ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๊ฒ€์ฆํ•˜๊ณ  ์ง๊ด€์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณด์—ฌ์คŒ.
โ€ข ๋‹จ&๋ณด์™„: ์™œ ์ด๋Ÿฐ ๋ถ„์„์„ ํ–ˆ๋Š”์ง€? ์™œ ๊ทธ๋Ÿฐ ๊ฒฐ๊ณผ๊ฐ€ ๋„์ถœ๋˜์—ˆ๋Š”๊ฐ€์— ๋Œ€ํ•œ ๋ถ„์„ ๋ถ€์กฑ. ์‚ฌ์‹ค์ƒ ์š”์ฆ˜ LLM์ด ๊ฑฐ์˜ ๋น„์Šทํ•œ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋˜์—ˆ๊ณ , ํฐ ํ‹€์—์„œ ๋ชจ๋ธ ๊ตฌ์กฐ๋„ ๊ฐ™๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๋Ÿฐ๊ฑฐ ์•„๋‹ˆ์•ผ?
2.5
์›ƒ์œผ๋ฉด์„œ ๋ณด์ž์žฅ์ ๊ณผ ๋ณด์™„์ : ์•„๋ž˜ ๋…ผ๋ฌธ์ด๋ž‘ ๊ฝค ๋น„์Šทํ•จ. ๋ชจ๋ธ ๊ฐ„ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฑด ๋ญ๊ณ , ๋‹ค๋ฅธ ๊ฑด ๋ฌด์—‡์ธ์ง€ ๋ถ„์„ํ•˜๋Š” ๊ฑด ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•จ. ๊ธฐ์กด์—๋Š” distillation ํ˜•ํƒœ์˜€๋‹ค๋ฉด, ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ๋Š” ์ง์ ‘์ ์œผ๋กœ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ? ๋ชจ๋ธ์ด๋‹ˆ๊นŒ, ์‚ฌ๋žŒ์€ ์œค๋ฆฌ์ ์ด์Šˆ๋กœ ์ƒ์ƒ๋งŒ ๊ฐ€๋Šฅํ•œ ๋‡Œ ์ด์‹์ด ๊ฐ€๋Šฅํ•˜์ง€ ์•Š์„๊นŒ? ๊ทธ๋Ÿฐ ๋ฐฉํ–ฅ์˜ ์—ฐ๊ตฌ๋„ ๊ถ๊ธˆํ•จ
๋‹จ์ : ๊ตฌ์กฐ๊ฐ€ ๊ทธ๋ ‡๊ฒŒ ๋‹ค๋ฅธ๊ฐ€ ์‹ถ์Œ..
3.7
๋…์ˆ˜๋ฆฌ์˜คํ˜•์ œ โ€ข ๊ฐ•์ : mutual k-NN๊ณผ affinity matrix๋ฅผ ์‚ฌ์šฉํ•ด, โ€œ์–ด๋–ค layer๊ฐ€ ์–ด๋–ค layer์™€ ๋Œ€์‘๋˜๋Š”๊ฐ€โ€๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ๋„ ์‰ฝ๊ฒŒ ์ „๋‹ฌํ•จ
โ€ข ์•ฝ์ : ํ˜„์ƒ์„ ์ž˜ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ์™œ ๊ทธ๋Ÿฐ alignment๊ฐ€ ์ƒ๊ธฐ๋Š”์ง€์— ๋Œ€ํ•œ explanation์€ ์•ฝํ•จ. ๋ถ„์„์ธก๋ฉด์ด ์—†์Œ. ๊ทธ๋ฆฌ๊ณ  cosine similarity ๋ง๊ณ  ๋‹ค๋ฅธ๊ธฐ์ค€์œผ๋กœ๋„ ์‹คํ—˜ ๊ฐ€๋Šฅํ• ๋“ฏ
โ€ข ๋ณด์™„: ํ˜„์ƒ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ๋ถ„์„ ๋ฐ ๋‹ค๋ฅธ metrics์„ ์ถ”๊ฐ€
3.6
ํŒ์ฝ˜โ€ข ์žฅ์ : ๋ฒˆ์—ญ ํ…์ŠคํŠธ ๋Œ€์ƒ์˜ ์‹คํ—˜์œผ๋กœ, ๋‹จ์ˆœ ํ† ํฐ์ด ์•„๋‹Œ ๊ตฌ์กฐ์  ์ •๋ณด์— ๋Œ€ํ•ด์„œ๋„ ๋ชจ๋ธ๋“ค์ด ์œ ์‚ฌํ•œ layer activation ๊ณต์œ ํ•จ์„ ๋ณด์ž„
โ€ข ๋‹จ์ : ๋‹ค๋ฅธ ๋ชจ๋ธ์ด๋”๋ผ๋„ layer์˜ ์ƒ๋Œ€์  ์œ„์น˜์— ๋”ฐ๋ผ ๋น„์Šทํ•œ ํŠน์„ฑ์„ ์ธ์ฝ”๋”ฉํ•œ๋‹ค๋Š” ๊ฑด ๊ธฐ์กด์— ์•Œ๋ ค์ง„ ์‚ฌ์‹ค๊ฐ™๋‹ค
โ€ข ๋ณด์™„์ : ๋ฐœ๊ฒฌํ•œ ์‚ฌ์‹ค์„ ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์„์ง€ ๊ถ๊ธˆํ•จ
3.2
์‚์งˆ โ€ข ์žฅ์ : ๋‹จ์ผ LLM์˜ layer ๋ถ„์„์„ ๋„˜์–ด, ์„œ๋กœ ๋‹ค๋ฅธ LLM์˜ layer alignment๋ฅผ ์ˆ˜ํ–‰ํ•จ. ๊ฐ ๋ ˆ์ด์–ด๋ฅผ ํ–‰๋ ฌ๋กœ ๋งค์นญ์‹œ์ผœ์„œ ํ•ด์„ํ•œ ๊ฒƒ์ด ์ƒˆ๋กœ์šด ๋ฐœ์ƒ์ž„.
โ€ข ์•ฝ์ : Alignment๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์ง€์— ๋Œ€ํ•œ ํ˜„์ƒ ์œ„์ฃผ๊ณ  ์ด๋ก ์  ์„ค๋ช…์ด ๋ถ€์žฌํ•จ. ๊ทธ๋ฆฌ๊ณ  ๊ตฌ์กฐ์  ๊ด€๊ณ„๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ์„ ํƒํ•œ nearest neighbor์˜ ๊ฒฝ์šฐ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์˜ ์ „์ฒด ๊ตฌ์กฐ ๋ฐ˜์˜์ด ์–ด๋ ค์›€.
โ€ข ๋ณด์™„์ : ์ „์ฒด์ ์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋น„์Šทํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋Š”์ง€ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ์˜ ๋น„๊ต ์‹คํ—˜์ด ํ•„์š”
3.3
ํŒŒ์ด์–ด โ€ข ์žฅ์ : LLM ๋ชจ๋ธ์ด ์„œ๋กœ ๋‹ค๋ฅธ ์กฐ๊ฑด์—์„œ ํ•™์Šต๋˜์—ˆ๋”๋ผ๋„ latent space๊ฐ€ ์œ ์‚ฌํ•œ ์ ์„ ํ•ด์„ํ•˜์—ฌ ์ฆ๋ช…ํ•œ ์ ์ด ๊ธฐ์—ฌ๊ฐ€ ํฌ๋‹ค๊ณ  ๋ด„.
โ€ข ๋‹จ์ : ์ฆ๋ช… ๋ถ€๋ถ„์ด ์•ฝํ•˜๊ณ  ํ˜„์ƒ ์„ค๋ช…์œผ๋กœ ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„๋Ÿ‰์„ ์ฑ„์šด ์ ์ด ์•„์‰ฌ์šด ๋ถ€๋ถ„์ž„.
โ€ข ๋ณด์™„: ์ด ์‚ฌ์‹ค์— ๋Œ€ํ•œ ํ˜„์ƒ ์„ค๋ช…์— ๊ทธ์น˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ด๋ก ์  ์ฆ๋ช…์ด ๋ฐ›์ณ์ฃผ์–ด์•ผ ํ•  ๊ฒƒ์ž„.
4
์ดˆ์ฝœ๋ฆฟ โ€ข ์žฅ์ : ์„œ๋กœ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜์˜ LLM๋“ค์„ layer ๋‹จ์œ„๋กœ ๋น„๊ตํ•œ๋‹ค๋Š” ๋ฐœ์ƒ ์ž์ฒด๊ฐ€ ํฅ๋ฏธ๋กœ์› ๊ณ  affinity matrix๋ฅผ ์‹œ๊ฐํ™”ํ–ˆ์„ ๋•Œ diagonal pattern์ด ๋ˆˆ์— ๋ฐ”๋กœ ๋ณด์—ฌ์„œ ์ฃผ์žฅ์ด ์ง๊ด€์ ์œผ๋กœ ๋ณด์˜€์Œ.
โ€ข ์•ฝ์ : top-k ์ด์›ƒ์ด ๋ช‡ ๊ฐœ ๊ฒน์น˜๋Š”์ง€๋งŒ ๋ณด๊ธฐ ๋•Œ๋ฌธ์— activation space์˜ ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋ฅผ ๋‹ค ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ. k ๊ฐ’์„ ๋ฐ”๊พธ๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ๋‹ฌ๋ผ์ง€์ง€ ์•Š์„๊นŒ.
โ€ข ๋ณด์™„์ : ๋‹ค๋ฅธ representation similarity ์ง€ํ‘œ๋กœ๋„ ๊ฐ™์€ ์‹คํ—˜์„ ํ•ด๋ณด๋ฉด ์–ด๋–จ๊นŒ
3.4
๋ฉ์ฟ ๋ฆผ๋ณดresidual connection์ด ์žˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  layer๊ฐ€ ๊นŠ์–ด์ง€๋ฉด ์ƒ์œ„ ๊ฐœ๋…์„ ์ธ์ฝ”๋”ฉํ•จ์— ๋”ฐ๋ผ space ํ˜•ํƒœ๊ฐ€ ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง€๋Š” ๊ฒƒ์€ ๊ฝค ๋†€๋ผ์›€. ๋‹ค๋ฅธ ๋ชจ๋ธ๋ผ๋ฆฌ ๋น„์Šทํ•œ ๊ฒƒ์€ ์–ด๋А์ •๋„ ๋‚ฉ๋“ํ• ๋งŒํ•œ ๊ฒฐ๊ณผ์ž„. ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ ๋น„์Šทํ•œ layer๋‚˜, ๊ฐ™์€ ๋ชจ๋ธ์— ์žˆ๋Š” ๋‹ค๋ฅธ ๊นŠ์ด์˜ layer์— ๋Œ€ํ•ด์„œ, activation์„ ๋Œ€์ฒดํ–ˆ์„ ๋•Œ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€๋„ ๊ถ๊ธˆํ•จ!3.6

TL; DR

๐Ÿ’ก
  • ์„œ๋กœ ๋‹ค๋ฅธ LLM๋“ค์„ ๋น„๊ตํ–ˆ์„ ๋•Œ, ๋น„์Šทํ•œ ์ƒ๋Œ€ depth์˜ layer๋“ค๋ผ๋ฆฌ activation geometry๊ฐ€ ์œ ์‚ฌํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚จ
  • ์ฆ‰, LLM๋งˆ๋‹ค layer representation์€ ๋ณ€ํ•˜์ง€๋งŒ, ๊ทธ ๋ณ€ํ™”์˜ progression์€ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋„˜์–ด ์–ด๋А ์ •๋„ ๊ณต์œ ๋จ

Summary

  • Author
  • Citation: 11

Introduction

Background

  • ์„œ๋กœ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜์˜ LLM๋“ค์ด ๋งŒ๋“ค์–ด๋‚ด๋Š” activation structure๊ฐ€ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ์ง€๋ฅผ ๋ถ„์„ํ•˜๊ณ ์ž ํ•จ
    ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต๋œ LLM๋“ค์˜ latent space๋Š” ์„œ๋กœ ์–ด๋–ป๊ฒŒ ๊ด€๋ จ๋˜๋Š”๊ฐ€?

    ๊ทธ ์•ˆ์— ๋ชจ๋ธ ์ „๋ฐ˜์— ๊ฑธ์ณ ๊ณต์œ ๋˜๋Š” ๋ณดํŽธ์  ์„ฑ์งˆ์ด ์žˆ๋Š”๊ฐ€?

  • Representation similarity๋ฅผ ๋ณผ ๋•Œ๋Š”, activation์ด ์ถ• permutation์ด๋‚˜ ๋ถ€ํ˜ธ ๋ฐ˜์ „(sign flip) ๊ฐ™์€ trivialํ•œ ์ฐจ์ด๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ
    • ๋‘ ๋ชจ๋ธ์ด ์‚ฌ์‹ค์ƒ ๋น„์Šทํ•œ ํ‘œํ˜„์„ ๋ฐฐ์šฐ๊ณ  ์žˆ์–ด๋„ ์–ด๋–ค ์ถ•์˜ ์ˆœ์„œ๊ฐ€ ๋ฐ”๋€Œ๊ฑฐ๋‚˜ ์–ด๋–ค ์ถ•์˜ ๋ถ€ํ˜ธ๊ฐ€ ๋ฐ˜๋Œ€๋กœ ์ •์˜๋˜์–ด ๋ฒกํ„ฐ ์ขŒํ‘œ๊ฐ’ ์ž์ฒด๋Š” ๊ฝค ๋‹ค๋ฅด๊ฒŒ ๋ณด์ผ ์ˆ˜ ์žˆ์Œ
      • Example of permutation / sign flip
        • e.g 1., LLM์˜ ์–ด๋–ค layer activation์„ ๋ฒกํ„ฐ๋ผ๊ณ  ํ•  ๋•Œ, ๊ฐ ๋ฌธ์žฅ์˜ ๋ฒกํ„ฐํ‘œํ˜„์ด ๋‹ค์Œ๊ณผ ๊ฐ™์Œ:
          • ๋ฌธ์žฅ A = [1.2, -0.7, 3.1] | ๋ฌธ์žฅ B = [1.1,โˆ’0.6,3.0]
        • e.g 2., ๊ฐ™์€ ์ง€๋„๋ฅผ ํ•˜๋‚˜๋Š” ๋ถ์ชฝ์ด ์œ„๋กœ, ํ•˜๋‚˜๋Š” ๋‚จ์ชฝ์ด ์œ„๋กœ, ํ•˜๋‚˜๋Š” x์ถ•/y์ถ•์„ ๋ฐ”๊ฟ”์„œ๊ทธ๋ฆผ

          โ†’ ์ขŒํ‘œ ์ˆซ์ž๋Š” ๋‹ฌ๋ผ์ ธ๋„, ์–ด๋А ๋„์‹œ๊ฐ€ ์–ด๋А๋„์‹œ์™€ ๊ฐ€๊นŒ์šด์ง€๋Š” ๊ฑฐ์˜ ๋ฐ”๋€Œ์ง€ ์•Š์Œ

    โ‡’ ๋ฒกํ„ฐ ์ขŒํ‘œ๊ฐ’์„ ๊ทธ ์ž์ฒด๋กœ ์ง์ ‘ ๋น„๊ตํ•˜๋ฉด โ€˜์–ด? ์™„์ „ ๋‹ค๋ฅด๋„ค?โ€™๋ผ๊ณ  ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์‚ฌ์‹ค์€ ๊ฐ™์€ ๊ตฌ์กฐ๋ฅผ ๋‹ค๋ฅธ ์ขŒํ‘œ๊ณ„๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ๋ฟ์ผ ์ˆ˜ ์žˆ์Œ

Motivation

  • ๊ฐ™์€ ์ž…๋ ฅ ์ง‘ํ•ฉ DD๏ปฟ๋ฅผ ๋ชจ๋ธ์— ๋„ฃ๊ณ  ํŠน์ • layer์˜ activation์„ ๋ชจ์œผ๋ฉด, ๊ฐ ์ž…๋ ฅ tt๏ปฟ์— ๋Œ€ํ•ด โ€˜์ด activation๊ณผ ๊ฐ€์žฅ ๋น„์Šทํ•œ ๋‹ค๋ฅธ ์ž…๋ ฅ๋“คโ€™, ์ฆ‰ nearest neighbors์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Œ
    • ์ €์ž๋“ค์ด ๊ด€์ฐฐํ•œ ๋‘ ๊ฐ€์ง€ ํ˜„์ƒ/์ฃผ์žฅ:
      • Claim 1: ๊ฐ™์€ ๋ชจ๋ธ ์•ˆ์—์„œ๋„ ๊นŠ์ด๊ฐ€ ๋‹ค๋ฅธ layer๋“ค์€ ์„œ๋กœ ๋‹ค๋ฅธ nearest-neighbor relationship๋ฅผ ํ˜•์„ฑํ•จ
      • Claim 2: ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ์ด๋ผ๋„ ๋Œ€์‘๋˜๋Š” ๊นŠ์ด์˜ layer๋“ค์€ ๋น„์Šทํ•œ nearest-neighbor relationship๋ฅผ ํ˜•์„ฑํ•จ

โ‡’ ์ฆ‰, activation์€ depth์— ๋”ฐ๋ผ ๋ณ€ํ•˜์ง€๋งŒ ๊ทธ ๋ณ€ํ™”์˜ progression ์ž์ฒด๋Š” ๋ชจ๋ธ๋“ค ์‚ฌ์ด์—์„œ ๊ณต์œ ๋œ๋‹ค!

So in this Paperโ€ฆ

  • ์ด ๊ฐ€์„ค์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ณด๊ธฐ ์œ„ํ•ด, ์„œ๋กœ ๋‹ค๋ฅธ LLM์˜ ๋ชจ๋“  layer pair๋ฅผ ๋น„๊ตํ•˜๋Š” layer-by-layer affinity matrix๋ฅผ ๊ตฌ์„ฑํ•จ
    • ๊ฐ cell์€ ๋‘ layer๊ฐ€ ๋งŒ๋“œ๋Š” nearest-neighbor relationship์˜ ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋ƒ„
  • ๊ทธ๋ฆฌ๊ณ  ์ด affinity matrix์— diagonal structure๊ฐ€ ๋‚˜ํƒ€๋‚˜๋Š”์ง€ ๋ณด๊ณ ์ž ํ•จ
    • diagonal์ด ๊ฐ•ํ•˜๋‹ค โ†’ ๋น„์Šทํ•œ ์ƒ๋Œ€์  ๊นŠ์ด์— ์žˆ๋Š” layer๋ผ๋ฆฌ ๋” ์œ ์‚ฌํ•˜๋‹ค
    • off-diagonal์ด ์•ฝํ•˜๋‹ค โ†’ ๊นŠ์ด๊ฐ€ ๋งŽ์ด ๋‹ค๋ฅธ layer๋ผ๋ฆฌ๋Š” ์œ ์‚ฌ์„ฑ์ด ๋‚ฎ๋‹ค

โ‡’ ์ด๋ฅผ ํ†ตํ•ด ๊ฐœ๋ณ„ layer ํ•˜๋‚˜ํ•˜๋‚˜๊ฐ€ ์•„๋‹Œ, LLM๋“ค์ด depth๋ฅผ ๋”ฐ๋ผ activation geometry๋ฅผ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”์‹œํ‚ค๋Š”์ง€๋ฅผ ๋ชจ๋ธ๊ตฐ ์ˆ˜์ค€์—์„œ ๋ถ„์„ํ•˜๊ณ ์ž ํ•จ

Contribution

  • 24๊ฐœ์˜ open-weight LLM (1Bโ€“70B)์„ ๋Œ€์ƒ์œผ๋กœ, ์„œ๋กœ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜์˜ LLM๋“ค ์‚ฌ์ด layer-wise affinity matrix๋ฅผ ๋Œ€๊ทœ๋ชจ๋กœ ๋น„๊ตํ•จ
    • ์ง์‚ฌ๊ฐํ˜• affinity matrix์—์„œ๋„ depth-aligned pattern์„ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋„๋ก generalized diagonal์„ ์ •์˜
  • activation similarity๋ฅผ mutual k-nearest neighbors (mutual k-NN) ๊ธฐ๋ฐ˜์œผ๋กœ ์ธก์ •ํ•ด, diagonal structure๋ฅผ ์ผ๊ด€๋˜๊ฒŒ ๋ณด์—ฌ์คŒ

Method

  • Affinity matrix๋กœ ๋ชจ๋“  layer pair์˜ similarity๋ฅผ ์ •๋ฆฌํ•œ ๋’ค, ์ง์‚ฌ๊ฐํ˜• matrix์—์„œ๋„ depth correspondence๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก generalized diagonal์„ ์ •์˜ํ•œ๋‹ค.
Affinity Matrix Construction
  1. Representation / Activation ์ •์˜
    • ๊ฐ transformer ๋ชจ๋ธ MM๏ปฟ์€ text๋ฅผ vector๋กœ ๋ณด๋‚ด๋Š” embedding function๋“ค์˜ ์ง‘ํ•ฉ์œผ๋กœ ๋ด„
      • e.g., embedding function ff๏ปฟ: ์ž…๋ ฅ text๋ฅผ ๋ฐ›์•„์„œ, M1M_1๏ปฟ 10๋ฒˆ์งธ layer์˜ ๋งˆ์ง€๋ง‰ token activation ๋ฒกํ„ฐ๋ฅผ ๋‚ด๋†“๋Š” ํ•จ์ˆ˜
    • ๊ฐ layer (decoder module) ๋์—์„œ ๋งˆ์ง€๋ง‰ token ์œ„์น˜์˜ hidden state ๋ฒกํ„ฐ๋ฅผ ๊บผ๋ƒ„
      • e.g.,
        • layer 1์˜ ๋งˆ์ง€๋ง‰ token hidden state
        • layer 2์˜ ๋งˆ์ง€๋ง‰ token hidden state
        • ...
        • layer L์˜ ๋งˆ์ง€๋ง‰ token hidden state

    โ†’ ๋”ฐ๋ผ์„œ ์ž…๋ ฅ ํ•˜๋‚˜์™€ layer ํ•˜๋‚˜๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ๊ทธ layer์˜ representation vector ํ•˜๋‚˜๊ฐ€ ์–ป์–ด์ง

  1. Nearest-neighbor relationship
    • ๋ฐ์ดํ„ฐ์…‹ DD๏ปฟ์˜ ๊ฐ ์ž…๋ ฅ xx๏ปฟ์— ๋Œ€ํ•ด, embedding function ff๏ปฟ๊ฐ€ ๋งŒ๋“  vector space์—์„œ xx๏ปฟ์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด kk๏ปฟ๊ฐœ์˜ ๋‹ค๋ฅธ ์ž…๋ ฅ์„ ์ฐพ์Œ
      • ์ด๋•Œ cosine distance๋ฅผ ์‚ฌ์šฉํ•จ
  1. Representational similarity measure: Mutual k-NN
    • ๋‘ embedding function f,gf,g๏ปฟ๊ฐ€ ์žˆ์„ ๋•Œ, ๊ฐ ์ž…๋ ฅ xx๏ปฟ์— ๋Œ€ํ•ด ff๏ปฟ๊ฐ€ ๋ฝ‘์€ top-k ์ด์›ƒ๊ณผ gg๏ปฟ๊ฐ€ ๋ฝ‘์€ top-k ์ด์›ƒ์ด ์–ผ๋งˆ๋‚˜ ๊ฒน์น˜๋Š”์ง€์˜ ํ‰๊ท  ๋น„์œจ์„ mutual k-NN์œผ๋กœ ์ธก์ •

      โ†’ โ€œ๋‘ layer๊ฐ€ ๊ฐ™์€ ์ž…๋ ฅ๋“ค์— ๋Œ€ํ•ด ๋น„์Šทํ•œ local geometry / ์ด์›ƒ ๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š”๊ฐ€?โ€๋ฅผ ๋ณด๋Š” ์ฒ™๋„

      • โ€œ๋‘ layer๊ฐ€ ๋น„์Šทํ•˜๋‹คโ€== mutual k-NN score๊ฐ€ ๋†’๋‹ค๋Š” ๋œป

      • ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ xx๏ปฟ ๋งˆ๋‹ค, ๋‘ layer f,gf, g๏ปฟ๊ฐ€ ๋ฝ‘์€ top-k ์ด์›ƒ์ด ์–ผ๋งˆ๋‚˜ ๊ฒน์น˜๋Š”์ง€ ๋ณธ ๋’ค, ๊ทธ ๋น„์œจ์„ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ํ‰๊ท ๋‚ธ ๊ฒƒ
        • Mk(D)M^{(D)}_k๏ปฟ ๊ฐ€ ํผ: ff๏ปฟ์™€ gg๏ปฟ๊ฐ€ โ€œ๋น„์Šทํ•œ local geometryโ€๋ฅผ ๊ฐ€์ง
        • Mk(D)M^{(D)}_k๏ปฟ ๊ฐ€ ์ž‘์Œ: ๊ฐ™์€ ์ž…๋ ฅ์„ ๋ด๋„ ์„œ๋กœ ๋‹ค๋ฅธ ์ด์›ƒ๋“ค์„ ๊ณ ๋ฆ„
  1. Affinity Matrix
    • ๋‘ ๋ชจ๋ธ M1M_1๏ปฟ, M2M_2๏ปฟ์˜ ๊ฐ layer ์Œ (i,j)(i, j)๏ปฟ์— ๋Œ€ํ•ด similarity๋ฅผ ๊ณ„์‚ฐํ•ด Ai,j=s(fM1(i),fM2(j))A_{i,j} = s(f^{(i)}_{M_1}, f^{(j)}_{M_2})๏ปฟ ํ˜•ํƒœ์˜ matrix๋ฅผ ๋งŒ๋“ฆ
    • Setting
      • similarity measure: mutual k-NN (k=10)
      • dataset DD๏ปฟ: OpenWebText 2048๊ฐœ ์ƒ˜ํ”Œ

    โ‡’ affinity matrix์˜ ๊ฐ cell์€ โ€˜๋‘ layer๊ฐ€ ๊ฐ™์€ ์ž…๋ ฅ๋“ค์— ๋Œ€ํ•ด ๋น„์Šทํ•œ nearest-neighbor relationship์„ ๋งŒ๋“œ๋Š” ์ •๋„โ€™๋ฅผ ์˜๋ฏธ

Generalized diagonal
  • Generalized diagonal
    • Rectangular matrix์—์„œ๋„ diagonal์„ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด generalized diagonal์„ ๋„์ž…ํ•จ
    • ์ผ๋ฐ˜์ ์œผ๋กœ diagonal์€ ์ •์‚ฌ๊ฐ ํ–‰๋ ฌ์—์„œ๋งŒ ์ž์—ฐ์Šค๋Ÿฌ์›€
      • e.g., 32-layer model vs 32-layer model ์ด๋ผ๋ฉด

        โ†’ (1, 1), (2, 2), โ€ฆ (i, i) ์ด๋Ÿฐ์‹์œผ๋กœ ๋Œ€๊ฐ์„ ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ

    • ํ•˜์ง€๋งŒ ์ด ๋…ผ๋ฌธ์€ layer ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค (e.g., 32-layer model vs 48-layer model) ๋„ ๋น„๊ตํ•˜๋ฏ€๋กœ, ์ ˆ๋Œ€์ ์ธ layer ๋ฒˆํ˜ธ๊ฐ€ ์•„๋‹ˆ๋ผ ์ƒ๋Œ€์  ๊นŠ์ด(relative depth) ๊ฐ€ ๋น„์Šทํ•œ pair๋“ค์„ diagonal์œผ๋กœ ๋ณด๊ณ ์ž ํ•จ
      • e.g., ์ž‘์€ ๋ชจ๋ธ์˜ 25% ์ง€์  layer, ํฐ ๋ชจ๋ธ์˜ 25% ์ง€์  layer ์€ ์„œ๋กœ ๋Œ€์‘๋˜๋Š” depth์ž„
    • ๋”ฐ๋ผ์„œ generalized diagonal์€ ๋น„์Šทํ•œ ์ƒ๋Œ€ depth๋ฅผ ๊ฐ–๋Š” layer pair๋“ค์ด ํฌํ•จ๋˜๋Š” diagonal band / region์œผ๋กœ ๊ฐ„์ฃผํ•จ

      โ†’ i.e., โ€˜๋”ฑ ํ•œ ์ค„์˜ diagonalโ€™์ด ์•„๋‹Œ, diagonal-like ์˜์—ญ

    ๋…ผ๋ฌธ์—์„œ๋Š” affinity matrix์˜ diagonal pattern์ด ๋‹จ์ˆœํ•œ ์šฐ์—ฐ์ด ์•„๋‹ˆ๋ผ ์‹ค์ œ ๊ฒฝํ–ฅ์ธ์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด naive t-test์™€ block bootstrap์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ฆํ•จ

Experiment

Setting

  • Models
    • 24๊ฐœ open-weight LLM
    • Size: 1B~70B parameters
    • Layers: 16~80 layers
  • Dataset
    • Main: OpenWebText์—์„œ ๋žœ๋ค ์ƒ˜ํ”Œ๋งํ•œ 2048๊ฐœ text
      • ์›น ๊ธฐ๋ฐ˜์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์žฅ๋ฌธ ํ…์ŠคํŠธ ๋ชจ์Œ
    • Appendix / sensitivity ๋ถ„์„์šฉ:
      • IMDB movie reviews, parallel English/German book translations, IFEval, MMLU, OPUS Books (English / German), Wikipedia featured article lead paragraphs, random alphanumeric strings

Results

  • Main Claim 1, Claim2
    • ๊ฐ™์€ ๋ชจ๋ธ์˜ ๋‹ค๋ฅธ layer depth์™€, ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ ๋Œ€์‘ ๋˜๋Š” depth์˜ layer์„ ๋น„๊ต (Claim 1 / Claim 2)
      • Claim 1: ๊ฐ™์€ ๋ชจ๋ธ ์•ˆ์—์„œ๋„ ๊นŠ์ด๊ฐ€ ๋‹ค๋ฅธ layer๋“ค์€ ์„œ๋กœ ๋‹ค๋ฅธ nearest-neighbor relationship๋ฅผ ํ˜•์„ฑํ•จ
      • Claim 2: ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ์ด๋ผ๋„ ๋Œ€์‘๋˜๋Š” ๊นŠ์ด์˜ layer๋“ค์€ ๋น„์Šทํ•œ nearest-neighbor relationship๋ฅผ ํ˜•์„ฑํ•จ
    • ๊ฐ™์€ ๋ชจ๋ธ ์•ˆ์—์„œ๋„ layer 10 vs layer 30์ฒ˜๋Ÿผ ๊นŠ์ด๊ฐ€ ๋‹ค๋ฅด๋ฉด activation geometry๊ฐ€ ๋‹ฌ๋ผ์ง
      • activation geometry: ํ•œ layer์˜ activation space์—์„œ ์ž…๋ ฅ๋“ค์ด ์„œ๋กœ ์–ด๋–ค ์ƒ๋Œ€์  ์œ„์น˜ ๊ด€๊ณ„
    • ๋‹ค๋ฅธ ๋ชจ๋ธ์ด๋ผ๋„ ๋น„์Šทํ•œ ์ƒ๋Œ€ depth์— ์žˆ๋Š” layer๋“ค์€ ์œ ์‚ฌํ•œ activation geometry๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ

      โ‡’ layer๋ณ„ geometry๋Š” ๋ณ€ํ•˜์ง€๋งŒ, ๊ทธ ๋ณ€ํ™”์˜ ์ˆœ์„œ/์ง„ํ–‰์€ ๋ชจ๋ธ๋งˆ๋‹ค ๊ณต์œ ๋จ

    • Llama-3.1-8B layer 10 vs Gemma-2-9B layer 20
      • OpenWebText์˜ ํŠน์ • text tt๏ปฟ ํ•˜๋‚˜์— ๋Œ€ํ•ด, ๋‘ ๋ชจ๋ธ์˜ ๋Œ€์‘๋˜๋Š” layer๊ฐ€ ๊ณ ๋ฅธ top-10 nearest neighbors๋ฅผ Venn diagram์œผ๋กœ ๋น„๊ต
        • Detail
          1. ๊ธฐ์ค€์ด ๋˜๋Š” ํ…์ŠคํŠธ t ๋ฅผ ๊ธฐ์ค€์œผ๋กœ, Llama-3.1-8B์˜ layer 10, Gemma-2-9B์˜ layer 20 ์—์„œ ๊ฐ๊ฐ activation์„ ๋ฝ‘์Œ
          1. ๊ฐ layer์˜ activation space ์•ˆ์—์„œ tt๏ปฟ์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํ…์ŠคํŠธ 10๊ฐœ, ์ฆ‰ top-10 nearest neighbors ๋ฅผ ์ฐพ์Œ (cosine distance)

          โ†’ ๊ฒน์น˜๋Š”๊ฒŒ ๋งŽ์„์ˆ˜๋ก ๋‘ layer๊ฐ€ t ์ฃผ๋ณ€์˜ local neighborhood๋ฅผ ๋น„์Šทํ•˜๊ฒŒ ๋ณด๊ณ  ์žˆ๋‹ค๋Š” ๋œป

    • Llama-3.1-8B layer 30 vs Gemma-2-9B layer 40
      • ๊ฐ™์€ text tt๏ปฟ์— ๋Œ€ํ•ด, ๋” ๋’ค์ชฝ์˜ ์„œ๋กœ ๋Œ€์‘ํ•˜๋Š” layer๋“ค์—์„œ top-10 nearest neighbors๋ฅผ ๋น„๊ต
      • ๊ฐ™์€ ๋ชจ๋ธ ์•ˆ์—์„œ๋Š” depth๊ฐ€ ๋ฐ”๋€Œ๋ฉฐ ๋ณด๋Š” ๊ตฌ์กฐ๊ฐ€ ๋ฐ”๋€Œ๋Š”๋ฐ, ๊ทธ ๋ฐ”๋€ ๊ตฌ์กฐ ์—ญ์‹œ ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ ๋Œ€์‘ depth์™€ ๋‹ค์‹œ ๋งž๋ฌผ๋ฆผ
  • Main Result: Diagonal Structure in Layer Affinity Matrices
    • OpenWebText 2048๊ฐœ์— ๋Œ€ํ•ด, 24๊ฐœ ๋ชจ๋ธ์˜ layer pair similarity๋ฅผ affinity matrix๋กœ ์‹œ๊ฐํ™”
    • Caption
    • LLM์˜ ๋‹ค์–‘ํ•œ ์กฐํ•ฉ์—์„œ diagonal structure๊ฐ€ ๋‚˜ํƒ€๋‚จ

      โ†’ ๋น„์Šทํ•œ ์ƒ๋Œ€์  ๊นŠ์ด์˜ layer๋ผ๋ฆฌ ๋” ์œ ์‚ฌํ•˜๋‹ค๋Š” ์˜๋ฏธ

      โ‡’ LLM๋“ค์€ depth์— ๋”ฐ๋ผ distinctํ•œ activation geometry์˜ progression์„ ๋งŒ๋“ค๊ณ , ๊ทธ progression์ด ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋„˜์–ด largely shared ๋จ

  • Depth correspondence์˜ ํ˜•ํƒœ ๋ถ„์„
    • ๊ฐ€์žฅ ๋น„์Šทํ•œ layer๊ฐ€ ์ƒ๋Œ€ ๋ชจ๋ธ์˜ ์–ด๋А depth์— ๋‚˜ํƒ€๋‚˜๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ๊ฐ depth๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋Œ€์‘ layer๋ฅผ ๊ฐ€์ง€๋Š”์ง€ ํ™•์ธํ•˜๊ณ ์ž ํ•จ
      • Detail
        • (a) most similar depth between two models
          • x์ถ•: model 1์—์„œ์˜ ์ƒ๋Œ€ ๊นŠ์ด
          • y์ถ•: ๊ทธ layer์™€ ๊ฐ€์žฅ ๋น„์Šทํ•œ model 2 layer์˜ ์ƒ๋Œ€ ๊นŠ์ด
        • (b) max similarity to other model
          • x์ถ•: model 1์˜ ์ƒ๋Œ€ ๊นŠ์ด
          • y์ถ•: ๊ทธ depth์—์„œ model 2์˜ ์–ด๋–ค layer์™€๋“  ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ตœ๋Œ€ similarity
    • (a) ๊ฐ๊ฐ์˜ ์„ ์€ model pair M1,M2M_1, M_2๏ปฟ ์„ ๋‚˜ํƒ€๋ƒ„

      โ†’ Similar layer์€ ์œ ์‚ฌํ•œ (๋น„๋ก€๋˜๋Š”) ๊นŠ์ด์— ๋‚˜ํƒ€๋‚จ (e.g., ์ „์ฒด depth์˜ 30% ์ง€์  ๋Œ€ 30% ์ง€์ )

    • (b) ๋‹ค๋ฅธ ๋ชจ๋ธ์—์„œ ๊ฐ€์žฅ ์ž˜ ๋งž๋Š” counterpart๋ฅผ ์ฐพ์•˜์„ ๋•Œ, ๊ทธ similarity๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋˜๋Š”๊ฐ€? ๋ฅผ ๋ด„

      โ†’ ๋Œ€๋ถ€๋ถ„ ์„ ์ด depth ์ „๋ฐ˜์— ๊ฑธ์ณ ํฌ๊ฒŒ ์•ˆ ๋ฌด๋„ˆ์ง & ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ์ด์ง€๋งŒ ๋‚ด๋ถ€์ ์œผ๋กœ ๊ฝค ๋น„์Šทํ•œ ์–‘์ƒ์„ ๋ณด์ž„

    • (c) ๋ชจ๋“  ๋ชจ๋ธ ์Œ์˜ affinity matrix ์— ํ‰๊ท ์„ ์ทจํ•˜๊ณ , ์ด๋ฅผ ์ •์‚ฌ๊ฐํ˜•์œผ๋กœ ๋งž์ถค

      โ†’ ๋น„์Šทํ•œ ์ƒ๋Œ€ ๊นŠ์ด๋ผ๋ฆฌ ๋” ์œ ์‚ฌํ•จ

  • On-diagonal vs. off-diagonal mean similarity
    • Detail
      • (a) generalized diagonal: ํŒŒ๋ž€์ƒ‰ ๋  โ€˜์˜์—ญโ€™ ์ž„
      • (b) ๊ฐ ์  ํ•˜๋‚˜๊ฐ€ ๋ชจ๋ธ pair ํ•˜๋‚˜๋ฅผ ์˜๋ฏธ
        • x์ถ•: generalized diagonal ์œ„ layer pair๋“ค์˜ ํ‰๊ท  similarity
        • y์ถ•: generalized diagonal ๋ฐ– layer pair๋“ค์˜ ํ‰๊ท  similarity
        • ์ ์„ : x=y ์ถ• (on-diagonal mean = off-diagonal mean ์ธ ๊ฒฝ๊ณ„์„ )
    • ์ ๋“ค์ด ์ ์„  ์•„๋ž˜์ชฝ์— ๋ชฐ๋ ค ์žˆ์Œ โ†’ ๋น„์Šทํ•œ ์ƒ๋Œ€ depth์˜ layer pair๋“ค์ด ํ‰๊ท ์ ์œผ๋กœ ๋” ๋†’์€ similarity๋ฅผ ๊ฐ€์ง
  • Cross-lingual Analysis: English vs. German
    • ๊ฐ™์€ ๋‚ด์šฉ์„ ๊ฐ€์ง„ ์˜์–ด text๋ฅผ ๋ชจ๋ธ 1์—, ๋…์ผ์–ด ๋ฒˆ์—ญ text๋ฅผ ๋ชจ๋ธ 2์— ๋„ฃ์–ด cross-lingual nearest-neighbor preservation์„ ๋ณด๊ณ ์ž ํ•จ
    • ๊ฐ™์€ ์–ธ์–ด๋ฅผ ๋„ฃ์„ ๋•Œ๋ณด๋‹ค similarity๋Š” ๋‚ฎ์•„์ง€๊ธด ํ•˜์ง€๋งŒ ๊ทธ๋ž˜๋„ ์•ฝํ•œ diagonal structure๋Š” ๋‚จ์•„์žˆ์Œ

      โ‡’ diagonal structure๊ฐ€ ๋‹จ์ˆœํžˆ ๊ฐ™์€ ๋ฌธ์žฅ์„ ๋„ฃ์–ด์„œ ์ƒ๊ธด ํ˜„์ƒ์ด ์•„๋‹Œ, ์–ธ์–ด๊ฐ€ ๋‹ฌ๋ผ๋„ ์–ด๋А ์ •๋„ ์œ ์ง€๋˜๋Š” ๋” ๊ตฌ์กฐ์ ์ธ ํ˜„์ƒ์ž„

  • Effect of Instruction Tuning on Activation Structure
    • Gemma-2-9B base vs Gemma-2-9B IT๋ฅผ ๋น„๊ตํ•˜๊ณ , input์€ OpenWebText์™€ IFEval๋กœ ๋ฐ”๊ฟˆ
      • OpenWebText: ์ผ๋ฐ˜ ์ž์—ฐ์–ด ์›น ํ…์ŠคํŠธ
      • IFEval: instruction-following ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋ ค๊ณ  ๋งŒ๋“  ํ”„๋กฌํ”„ํŠธ ๋ฐ์ดํ„ฐ์…‹
    • base ๋ชจ๋ธ๊ณผ instruction-tuned ๋ชจ๋ธ์˜ activation structure ์ฐจ์ด๊ฐ€ ์ž…๋ ฅ ๋ถ„ํฌ์— ๋”ฐ๋ผ ์–ด๋–ป๊ฒŒ ๋‹ฌ๋ผ์ง€๋Š”์ง€๋ฅผ ๋ณด๊ณ ์ž ํ•จ
    • OpenWebText (a): base vs IT ์‚ฌ์ด์—๋„ ์ „ depth์—์„œ strong diagonal structure๊ฐ€ ๋ณด์ž„
      โ†’ ์ผ๋ฐ˜ ์›น ํ…์ŠคํŠธ์—์„œ๋Š” base์™€ IT๊ฐ€ ์—ฌ์ „ํžˆ ๋น„์Šทํ•œ ๋Œ€์‘ layer ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง
    • IFEval (b): ํŠนํžˆ later layers์—์„œ similarity๊ฐ€ ๋งŽ์ด ๋‚ฎ์•„์ง
      โ†’ instruction-following์— ๋งž์ถ˜ ์ž…๋ ฅ์—์„œ๋Š”, fine-tuning์˜ ์˜ํ–ฅ์ด late layers์—์„œ ๋” ํฌ๊ฒŒ ๋“œ๋Ÿฌ๋‚จ

      โ‡’ input distribution์ด diagonal structure์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์Œ

  • Random alphanumeric strings
    • ์˜๋ฏธ ์—†๋Š” ๋žœ๋ค ๋ฌธ์ž์—ด 2048๊ฐœ๋ฅผ ์ž…๋ ฅํ•ด affinity matrix์„ ๋ด„
      • diagonal structure๊ฐ€ ์ •๋ง ์˜๋ฏธ ์žˆ๋Š” representation progression์„ ๋ฐ˜์˜ํ•˜๋Š”์ง€, ์•„๋‹ˆ๋ฉด ์•„๋ฌด ์ž…๋ ฅ์—๋‚˜ ์ž๋™์œผ๋กœ ๋‚˜์˜ค๋Š” ํŒจํ„ด์ธ์ง€ ๋ณด๊ณ ์ž ํ•จ(์ž…๋ ฅ ๋ถ„ํฌ์— ๋Œ€ํ•œ ๋ฏผ๊ฐ๋„ ํ™•์ธ)
    • ๋ชจ๋ธ ๊ฐ„ nearest-neighbor similarity๋Š” ์—ฌ์ „ํžˆ ์กด์žฌํ•˜์ง€๋งŒ, diagonal structure๋Š” ์‚ฌ๋ผ์ง
    • Early / late layers๋Š” random strings์—์„œ๋„ ์„œ๋กœ ๋†’์€ agreement๋ฅผ ๋ณด์ž„
      • ์ €์ž๋“ค์ด ๊ทธ agreement๋ฅผ ๋œฏ์–ด๋ณด๋‹ˆ last few characters ๊ฐ™์€ ํ‘œ๋ฉด feature์— ๋งŽ์ด ์˜์กดํ•œ๋‹ค๊ณ  ํ•จ
      • i.e., random string์—๋Š” ์˜๋ฏธ์  ๊ตฌ์กฐ๊ฐ€ ์—†์œผ๋‹ˆ๊นŒ ๋ชจ๋ธ์ด neighbor๋ฅผ ๊ณ ๋ฅผ ๋•Œ ๋” ๊นŠ์€ semantic structure ๋Œ€์‹  ์ด๋Ÿฐ ์‰ฝ๊ฒŒ ์žกํžˆ๋Š” ํ‘œ๋ฉด ๋‹จ์„œ์— ๊ธฐ๋Œ”๋‹ค๋Š” ์˜๋ฏธ
        • e.g., โ€œ๋ ๋‘ ๊ธ€์ž๊ฐ€ ๊ฐ™๋„คโ€, โ€œํ˜•ํƒœ๊ฐ€ ๋น„์Šทํ•˜๋„คโ€ ๊ฐ™์€ ์–•์€ ๊ธฐ์ค€

    โ‡’ ๋”ฐ๋ผ์„œ diagonal structure๋Š” โ€œ๋ชจ๋ธ์ด๋ฉด ํ•ญ์ƒ ๋‚˜์˜ค๋Š” ํ˜„์ƒโ€์ด ์•„๋‹ˆ๋ผ, ์ž…๋ ฅ ๋ถ„ํฌ์— ์˜์กดํ•˜๋Š” ํ˜„์ƒ

Categories

PROBINGresearch