07 January 2026

Layer by Layer: Uncovering Hidden Representations in Language Models

๐Ÿ’กAutoregressive ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” ์–ธ์–ด๋ชจ๋ธ์€ ์ค‘๊ฐ„ layer ํ‘œํ˜„์ด ๊ฐ€์žฅ ํ’๋ถ€ํ•˜๋‹ค!

์—ผ๊ทœํ™˜
์—ผ๊ทœํ™˜
๐Ÿฅˆ

Layer by Layer: Uncovering Hidden Representations in Language Models

Review

๋‹‰๋„ค์ž„ ํ•œ์ค„ํ‰๋ณ„์  (0/5)
๋งˆ์Šคํ‚นํ…Œ์ดํ”„๋…ผ๋ฌธ์ด ๋‚ธ ๋ถ„์„ ๊ฒฐ๊ณผ๊ฐ€ ์ •๋ง ๋งŽ์€ ๋„์›€์ด ๋  ๊ฒƒ ๊ฐ™์€ ๋…ผ๋ฌธ์ž„. ๊ฒฐ๊ตญ ๋ ˆ์ด์–ด ๋ณ„๋กœ, ํ™œ์šฉํ•ด์•ผ ํ•˜๋Š” ๊ฐ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ๊ฐ€ ๋‹ฌ๋ผ์งˆ ์ˆ˜๋„ ์žˆ๊ณ , ๋ถ„์„์—์„œ ์–ด๋–ค ๋ ˆ์ด์–ด๋ฅผ ์“ฐ๋Š”์ง€์— ๋”ฐ๋ผ ํƒœ์Šคํฌ ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์ง„๋‹ค๋ฉด ํฐ ์˜๋ฏธ๋ฅผ ์ฃผ๋Š” ์—ฐ๊ตฌ๋ผ๊ณ  ์ƒ๊ฐํ•จ. CoT ํŠœ๋‹์ด ์ค‘๊ฐ„ ๋ ˆ์ด์–ด์˜ ํ‘œํ˜„์„ ํ’๋ถ€ํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค๋Š” ์ ๋„ ๊ทธ ์ด์œ ๋ฅผ ๋‹ค์‹œ ์ƒ๊ฐํ•ด๋ณผ๋งŒ ํ•˜๋‹ค๊ณ  ๋А๊ปด์ง.4.3
๋™๊นŒ์Šค๋‹น์—ฐํžˆ ์ตœ์ข… ๋ ˆ์ด์–ด์—์„œ์˜ ๋ ˆ์ด์–ด ํ‘œํ˜„์ด ๊ฐ€์žฅ ํ’๋ถ€ํ•  ๊ฑฐ๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ ์‹คํ—˜ ๊ฒฐ๊ณผ๊ฐ€ ๋†€๋ž๋‹ค. ์‹ฌ์ง€์–ด ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฐ๊ณผ์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๋‚˜์˜ค๋Š”๊ฑฐ๋ณด๋ฉด ์•ž์œผ๋กœ๋„ ์—ฐ๊ตฌ๊ฑฐ๋ฆฌ ๋ฌด๊ถ๋ฌด์ง„ํ•œ ๊ฒƒ ๊ฐ™์Œ4.4
๊ทค๋‚ด๊ฐ€ ์ง€๊ธˆ๊นŒ์ง€ ์ฝ์—ˆ๋˜ ๋Œ€๋ถ€๋ถ„์˜ ๋…ผ๋ฌธ๋“ค์ด ๋งˆ์ง€๋ง‰ layer์„ ๊ธฐ์กด์œผ๋กœ ์‚ฌ์šฉํ•ด์™”๊ณ , ๋‚˜์—ญ์‹œ๋„ ๊ทธ๊ฒŒ ์ตœ์„ ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋˜๊ฑฐ๊ฐ™์€๋ฐ, ์ด ๊ด€๋…์„ ๋‚ด์šฉ์ธ๊ฒƒ ๊ฐ™๋‹ค. ์ˆ˜ํ–‰ํ•˜๋ ค๋Š” task์˜ ํŠน์„ฑ์— ๋งž์ถฐ์„œ ์–ด๋–ค layer์„ ์‚ฌ์šฉํ• ์ง€๋„ ์ค‘์š”ํ•˜๊ฒŒ ๊ณ ๋ คํ•ด์•ผ ํ•  ๊ฒƒ ๊ฐ™์Œ.4.3
์ˆ˜๋ฉด์žฅ์•  ์„์‚ฌ 2ํ•™๊ธฐ ๋•Œ hidden state ๋ฅผ ์“ฐ๋Š” ๋…ผ๋ฌธ๋“ค์„ ๋ชจ์•„์„œ โ€œ์–ธ์ œ ์–ด๋–ค hidden state๋ฅผ ์“ธ๊นŒ?โ€๋ฅผ ์ •๋ฆฌํ•ด๋ณธ ๊ฒฝํ—˜์ด ์žˆ๋Š”๋ฐ, ๊ทธ๋•Œ ์ƒ๊ฐ๋ณด๋‹ค ๊ฒฝํ–ฅ์„ฑ์ด ์—†๊ณ  ๋‹ค๋“ค ์ž๊ธฐ ๋ง˜๋Œ€๋กœ๋ผ ๋‹นํ™ฉํ–ˆ๋˜ ๊ธฐ์–ต์ด ์žˆ์Œ. ๊ทธ๋Ÿฐ๋ฐ ๊ทธ ์ด์œ ๋ฅผ ์ด์ œ์•ผ ์•Œ๊ฒŒ ๋˜์—ˆ๋„ค์šฉ
+ VLM๋„ ๊ฐ™์ด ์‹คํ—˜ํ•œ๊ฒŒ ์‹ ๋ขฐ๋„๊ฐ€ ํ™• ๋†’์•„์ง„๋‹ค!
4.3
์ด์–ดํฐ์ค‘๊ฐ„ ๋ ˆ์ด์–ด ํ‘œํ˜„์ด ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์— ๋น„ํ•ด ๋” ํ’๋ถ€ํ•˜๋‹ค๊ณ  ์ „์ œํ•˜๊ณ  ์ค‘๊ฐ„ ๋ ˆ์ด์–ด ํ‘œํ˜„ ์“ฐ๋Š” ๋…ผ๋ฌธ์€ ๋งŽ์ด ๋ด์™”๋Š”๋ฐ, ์ด๋ฅผ ์‹ค์ œ๋กœ ์‹คํ—˜์œผ๋กœ ์ฆ๋ช…ํ•ด ์คฌ๋‹ค. ๋น„์ ผ ํŠธ๋žœ์Šคํฌ๋จธ์™€ ๋น„๊ต๊ฐ€ ํฅ๋ฏธ๋กญ๊ณ  CoT ์ด์™ธ์—๋„ ์•„์˜ˆ RL์ด๋ž„์ง€ ํ•™์Šต ๋ฐฉ๋ฒ•์˜ ์˜ํ–ฅ๋„ ์žˆ์„์ง€ ๊ถ๊ธˆํ•˜๋‹ค4
์‚ฌ๊ณผ๋Œ€๋ถ€๋ถ„์˜ XAI ๋ฐ Representation ์—ฐ๊ตฌ์—์„œ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ถ”๋ก ์˜ ์ด์œ ๋‚˜ ํ‘œํ˜„์˜ ์ด์œ ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์€๋ฐ, ์ค‘๊ฐ„ ๋ ˆ์ด์–ด ํ‘œํ˜„์˜ ์ค‘์š”ํ•จ์„ ์ด ๋…ผ๋ฌธ์—์„œ Metric์œผ๋กœ ์ธก์ •ํ•˜์—ฌ ์„ค๋ช…ํ•จ์œผ๋กœ์จ ์‹ ๋ขฐ๋„๋ฅผ ๋†’์˜€๋‹ค๊ณ  ๋ด„. 4.6
7์ผ๊ณตํ†ต๋œ ์ˆ˜ํ•™ metric์„ ํ™œ์šฉํ•ด์„œ ๋‹ค์–‘ํ•œ ์‹คํ—˜์œผ๋กœ ๊ฒ€์ฆํ•œ๊ฒŒ ๊ฐ€์žฅ ํฐ contribution. MTEB ํƒœ์Šคํฌ ์ž์ฒด๊ฐ€ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ์˜ ํ’ˆ์งˆ ์ž์ฒด๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ์—, ์ด๋ฅผ higher-level (QA, NLG)์™€ ๊ฐ™์€ ํƒœ์Šคํฌ์—์„œ๋„ ๊ฒฝํ–ฅ์ƒ์ด ๋™์ผํ• ๊นŒ? ์ด๊ฑด ์•„๋‹ ๊ฑฐ ๊ฐ™์Œ. ๊ฒฐ๊ตญ message passing ๊ฐ™์ด ์ค‘๊ฐ„ ํ๋ฆ„์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ ๊ณ„์‚ฐ ์‹œ ์ค‘๊ฐ„ layer์„ ๊ฐ•์กฐํ•˜๊ณ , ์ตœ์ข… fine-tuning task์—์„œ๋Š” final layer์— ์ง‘์ค‘ํ•ด์•ผํ• ์ง€๋„?4.4

TL; DR

๐Ÿ’ก

Autoregressive ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” ์–ธ์–ด๋ชจ๋ธ์€ ์ค‘๊ฐ„ layer ํ‘œํ˜„์ด ๊ฐ€์žฅ ํ’๋ถ€ํ•˜๋‹ค!

Summary

  • ์—ฐ๊ตฌ์ง„ : ๋ฏธ๊ตญ ์ผ„ํ„ฐํ‚ค๋Œ€ํ•™, NYU, UCLA, Meta
  • ์ธ์šฉ์ˆ˜ : 89

์—ฐ๊ตฌ ๋™๊ธฐ

  • LLM์€ ์ฃผ๋กœ ๋งˆ์ง€๋ง‰ layer์˜ ์ถœ๋ ฅ์„ downstream task์— ์‚ฌ์šฉ
    • โ€œ์–•์€ layer๋Š” ๋‹จ์ˆœํžˆ low-level ์ •๋ณด๋ฅผ ๋‹ด๋Š”๋‹คโ€๋Š” ์ผ๋ฐ˜์ ์ธ ๊ฐ€์ •์— ๊ธฐ๋ฐ˜ํ•จ
  • ์ €์ž๋“ค์€ ์ด ๊ฐ€์ •์— ์˜๋ฌธ์„ ์ œ๊ธฐ!
    • โ€œ๋งˆ์ง€๋ง‰ layer๊ฐ€ ํ•ญ์ƒ ์ตœ๊ณ ์˜ representation์„ ์ œ๊ณตํ•˜๋Š”๊ฐ€?โ€

โ†’ ์‹ค์ œ๋กœ ์ค‘๊ฐ„ layer๊ฐ€ ๋” ํ’๋ถ€ํ•œ ํ‘œํ˜„๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ task์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์Œ์„ ์‹ค์ฆ์ ์œผ๋กœ ํ™•์ธํ•ด๋ณด์ž!


Key Findings

  1. ์ค‘๊ฐ„ layer๊ฐ€ ์ผ๊ด€๋˜๊ฒŒ ๋งˆ์ง€๋ง‰ layer๋ณด๋‹ค ๋” ์šฐ์ˆ˜ํ•จ์„ ์‹คํ—˜์ ์œผ๋กœ ์ฆ๋ช…
  1. ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜(์ข…๋ฅ˜, ํฌ๊ธฐ)์™€ ํ•™์Šต ์ง„ํ–‰๋„์— ๋”ฐ๋ฅธ ํ‘œํ˜„๋ ฅ ์ฐจ์ด๋ฅผ ๋น„๊ต
  1. CoT ํŒŒ์ธํŠœ๋‹์ด ์ค‘๊ฐ„ layer ํ‘œํ˜„์„ ํ’๋ถ€ํ•˜๊ฒŒ ๋งŒ๋“ฆ

ํ‰๊ฐ€์ง€ํ‘œ ์„ค๊ณ„

  • Representation์˜ quality๋ฅผ ์–ด๋–ป๊ฒŒ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?
    1. Representation์ด ์–ผ๋งˆ๋‚˜ ์••์ถ•(compressed) ๋˜์—ˆ๋Š”์ง€?
    1. Input์ด perturbation / augmentation์— ๋Œ€ํ•ด ์–ผ๋งˆ๋‚˜ robustํ•œ์ง€?
    1. ์„œ๋กœ ๋‹ค๋ฅธ input์„ ์–ด๋–ป๊ฒŒ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ๊ตฌ์„ฑํ•˜๋Š”์ง€?
  • Matrix-Based Entropy: ๊ณตํ†ต๋œ ์ˆ˜ํ•™์  ๊ด€์ 
    • ๊ณ ์œ ๊ฐ’ (Eigenvalue) ฮปi\lambda_i๏ปฟ๋ฅผ ์ด์šฉํ•ด entropy ๊ณ„์‚ฐ
      • ฮปi\lambda_i๏ปฟ : ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” axis์ธ eigenvector ๋ฐฉํ–ฅ์œผ๋กœ ํฌํ•จ๋œ ์ •๋ณด์˜ ์–‘
      • ZโˆˆRnร—d\mathbf{Z}\in\mathbb{R}^{n\times d}๏ปฟ : Representation matrix (n๊ฐœ์˜๋ฐ์ดํ„ฐ,d์ฐจ์›)(n๊ฐœ์˜ ๋ฐ์ดํ„ฐ, d ์ฐจ์›)๏ปฟ
      • K=ZZT\mathbf{K}=\mathbf{Z}\mathbf{Z}^T๏ปฟ : Gram matrix (Representation ๊ฐ„ ์œ ์‚ฌ๋„ ํ–‰๋ ฌ)
      • ฮฑ\alpha๏ปฟ : Smoothing ์ง€์ˆ˜

      โ†’ Input ์ƒ˜ํ”Œ์ด ๋ช‡ ๊ฐœ์˜ ๊ณ ์œ ๊ฐ’์— ์ง‘์ค‘๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ์ธก์ •
      โ†’ ์ง๊ด€์ ์œผ๋กœ eigenvalue๊ฐ€ ๊ณ ๋ฅด๊ฒŒ ํผ์ ธ์žˆ์œผ๋ฉด high entropy

    • Insight 1 (์ •๋ณด ์••์ถ• ๊ด€์ )
      • ๊ณ ์œ ๊ฐ’ ์ค‘ ๋ช‡ ๊ฐœ๋งŒ ํฐ ๊ฒฝ์šฐ โ†’ ๋‚ฎ์€ ์ฐจ์›์— ์ •๋ณด๊ฐ€ ์ง‘์ค‘๋˜๋ฉฐ ์ผ๋ถ€๋ถ„์˜ axis๋กœ๋งŒ ์••์ถ•๋จ โ†’ low entropy
      • ๊ณ ์œ ๊ฐ’์ด ๊ณ ๋ฅด๊ฒŒ ๋ถ„ํฌํ•œ ๊ฒฝ์šฐ โ†’ ์ •๋ณด๊ฐ€ ์—ฌ๋Ÿฌ axis์— ๋น„์Šทํ•˜๊ฒŒ ๋ถ„์‚ฐ๋จ โ†’ high entropy
    • Insight 2 (Geometry ๊ด€์ )
      • ํ† ํฐ ์ž„๋ฒ ๋”ฉ์ด ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์ด์–ด์ง„ ๊ฒฝ๋กœ๋ฅผ ๋”ฐ๋ผ๊ฐ€๋ฉด โ†’ ๊ณก๋ฅ (curvature)์ด ๋‚ฎ๊ณ  high entropy
      • ๊ฐ‘์ž‘์Šค๋Ÿฝ๊ฒŒ ๊บพ์ด๋ฉด (์ฆ‰, ์—ฐ์†๋œ ํ† ํฐ์˜ ์ž„๋ฒ ๋”ฉ ๋ฐฉํ–ฅ์ด ๊ธ‰๋ณ€ํ•˜๋ฉด) โ†’ ๊ณก๋ฅ ์ด ๋†’๊ณ  ๊ณ ์œ ๊ฐ’ ๋ถ„ํฌ๊ฐ€ ํ•œ์ชฝ์œผ๋กœ ์ ๋ ค low entropy

      โ†’ Embedding ๊ณก๋ฅ ๋„ ๊ฒฐ๊ตญ ๊ณ ์œ ๊ฐ’ ๋ถ„ํฌ(=entropy)๋กœ ๋ฐ˜์˜ ๊ฐ€๋Šฅ

    • Insight 3 (Input perturbation/augmentation์— ๋”ฐ๋ฅธ Robustness ๊ด€์ )
      • Strong invariance(=robust) โ†’ ๊ฐ™์€ ์˜๋ฏธ์˜ ์ƒ˜ํ”Œ์ด embedding ๊ณต๊ฐ„์—์„œ ์•ˆ์ •์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง๋จ โ†’ ์—”ํŠธ๋กœํ”ผ ์œ ์ง€
  • 7๊ฐ€์ง€ Representation ํ‰๊ฐ€ ์ง€ํ‘œ
    • ์ •๋ณด์ด๋ก  ๊ธฐ๋ฐ˜
      • Prompt Entropy
        • ํ•˜๋‚˜์˜ ํ”„๋กฌํ”„ํŠธ ์•ˆ์—์„œ ํ† ํฐ ์ž„๋ฒ ๋”ฉ์ด ์–ผ๋งˆ๋‚˜ ๋‹ค์–‘ํ•˜๊ฒŒ ํผ์ ธ ์žˆ๋Š”๊ฐ€?
          • ๋†’์€ entropy โ†’ ํ‘œํ˜„์ด ๋‹ค์–‘ํ•จ โ†’ ๋œ ์ค‘๋ณต๋˜๊ณ  ํ’๋ถ€ํ•œ ํŠน์ง•
          • ๋‚ฎ์€ entropy โ†’ ํ‘œํ˜„์ด ๋น„์Šทํ•จ โ†’ ์ •๋ณด๊ฐ€ ์••์ถ•๋จ
      • Dataset Entropy
        • ์—ฌ๋Ÿฌ ํ”„๋กฌํ”„ํŠธ ์ž„๋ฒ ๋”ฉ์ด ๋ฐ์ดํ„ฐ์…‹ ์ „๋ฐ˜์—์„œ ์–ผ๋งˆ๋‚˜ ๋‹ค์–‘ํ•˜๊ฒŒ ํผ์ ธ ์žˆ๋Š”๊ฐ€?
          • ๋†’์€ entropy โ†’ ์„œ๋กœ ๋‹ค๋ฅธ ํ”„๋กฌํ”„ํŠธ ๊ฐ„ ํ‘œํ˜„์ด ์ž˜ ๊ตฌ๋ณ„๋จ
          • ๋‚ฎ์€ entropy โ†’ ์ž…๋ ฅ์— ์ƒ๊ด€์—†์ด ํ‘œํ˜„์ด ์œ ์‚ฌํ•ด์ง (์ •๋ณด ์†์‹ค ๊ฐ€๋Šฅ์„ฑ)
      • Effective Rank
        • ํ‘œํ˜„ ๊ณต๊ฐ„์ด ์‹ค์ œ๋กœ ๋ช‡ ์ฐจ์›์œผ๋กœ ๊ตฌ์„ฑ๋˜๋‚˜?
          • ๊ฐ’์ด ๋‚ฎ์„์ˆ˜๋ก โ†’ ๋Œ€๋ถ€๋ถ„ ์ •๋ณด๊ฐ€ ์†Œ์ˆ˜ ์ฐจ์›์— ์••์ถ•๋จ
          • ๊ฐ’์ด ๋†’์„์ˆ˜๋ก โ†’ ์ •๋ณด๊ฐ€ ๊ณ ๋ฅด๊ฒŒ ํผ์ง
    • ๊ธฐํ•˜ํ•™ ๊ธฐ๋ฐ˜
      • Curvature
        • ๊ณก๋ฅ ์„ ๊ณ„์‚ฐํ•˜์—ฌ ์—ฐ์†๋œ ํ† ํฐ ๋ฒกํ„ฐ๋“ค ๊ฐ„์˜ ๋ฐฉํ–ฅ ์ „ํ™˜ ์ •๋„ ์ธก์ •
          • ๋†’์€ curvature โ†’ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ๋ฐฉํ–ฅ์ด ๊ธ‰๊ฒฉํžˆ ์ „ํ™˜๋จ โ†’ local ์ •๋ณด ๊ฐ•์กฐ
          • ๋‚ฎ์€ curvature โ†’ ์™„๋งŒํ•œ ๊ถค์  โ†’ global ๋ฌธ๋งฅ ๊ฐ•์กฐ
    • ๋ณ€ํ˜• ๋ถˆ๋ณ€์„ฑ ๊ธฐ๋ฐ˜
      • InfoNCE
        • ๊ฐ™์€ ์˜๋ฏธ์˜ ์ž…๋ ฅ pair๊ฐ€ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ์–ผ๋งˆ๋‚˜ ์„œ๋กœ ๊ฐ€๊นŒ์šด์ง€ ์ธก์ •
        • InfoNCE loss๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก robustํ•จ
      • LiDAR
        • Augmentation ์ „ํ›„ ์ž„๋ฒ ๋”ฉ์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง๋˜๋Š”์ง€ ์ธก์ •
        • LiDAR ์Šค์ฝ”์–ด๊ฐ€ ๋†’์„์ˆ˜๋ก robustํ•จ
      • DiME
        • Augmented pair๊ฐ€ random pair ๋Œ€๋น„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ •๋ ฌ๋˜์–ด์žˆ๋Š”์ง€ ์ธก์ •
        • ๋†’์„์ˆ˜๋ก ์ •๋ ฌ ์ž˜๋จ

Experiments

  • Downstream Task Performance
    โ€œ๋งˆ์ง€๋ง‰ layer๊ฐ€ ํ•ญ์ƒ ์ตœ์„ ์ธ๊ฐ€?โ€์— ๋Œ€ํ•œ ์‹ค์ฆ์  ๊ฒ€์ฆ
    • ๋น„๊ต๋ชจ๋ธ
      • Pythia : Decoder-only Transformer
      • Mamba : State Space Model
      • BERT-base : Encoder-only Transformer
    • ๋ฒค์น˜๋งˆํฌ : MTEB (Massive Text Embedding Benchmark)
      • 32๊ฐœ Task: Span classification, Semantic textual similarity, clustering, rerankingโ€ฆ

    โ†’ ๊ฑฐ์˜ ๋ชจ๋“  ํƒœ์Šคํฌ์—์„œ ์ค‘๊ฐ„ layer๊ฐ€ ๋งˆ์ง€๋ง‰ layer๋ณด๋‹ค ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋ก

  • Metrics vs. Performance Correlation
    ์•ž์„œ ์ •์˜ํ•œ representation ํ’ˆ์งˆ์— ๋Œ€ํ•œ ํ‰๊ฐ€ metric๋“ค์ด ์‹ค์ œ๋กœ ์„ฑ๋Šฅ๊ณผ ๊ด€๋ จ ์žˆ๋Š”์ง€ ๊ฒ€์ฆ (dCor metric ํ™œ์šฉ)
    • dCor (distance Correlation) โˆˆ[0,1]\in[0,1]๏ปฟ : Linear & Non-linear ๋ชจ๋‘์— ๋Œ€ํ•ด ๋‘ ํ–‰๋ ฌ (representation ์ง‘ํ•ฉ) ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜์กด์„ฑ์„ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ

    โ†’ Label ์—†์ด๋„ "๊ฐ€์žฅ ์ข‹์€ ํ‘œํ˜„์„ ๊ฐ€์ง„ layer"๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค!

  • Architectural and Scale Differences
    ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์™€ ํฌ๊ธฐ์— ๋”ฐ๋ผ representation ํ’ˆ์งˆ ๋ณ€ํ™” ํŒจํ„ด์ด ๋‹ค๋ฅธ์ง€ ํŒŒ์•…
    • ์•„ํ‚คํ…์ณ์— ๋”ฐ๋ฅธ ํ’ˆ์งˆ ๋ณ€ํ™”
      • BERT : ์ „์ฒด์ ์œผ๋กœ ๊ณ ๋ฅธ ์ •๋ณด๋Ÿ‰ ์œ ์ง€โ†’ bottleneck ์ ์Œ
      • Pythia : ์ค‘๊ฐ„ layer์—์„œ bottleneck ๋ฐœ์ƒ
      • Mamba : ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š” ํ˜•ํƒœ (BERT์™€ Pythia์˜ ์ค‘๊ฐ„ ์ •๋„)
    • ๋ชจ๋ธ ํฌ๊ธฐ์— ๋”ฐ๋ฅธ ํ’ˆ์งˆ ๋ณ€ํ™”
      • ๋ชจ๋ธ ํฌ๊ธฐ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก (14M โ†’ 1B) entropy valley๊ฐ€ ๋” ๋šœ๋ ทํ•ด์ง€๋ฉฐ, curvature ๊ฐ’์ด ๋‚ฎ์•„์ง€๊ณ , invariantํ•ด์ง„๋‹ค!

      โ†’ Feature ํ‘œํ˜„์ด ๋” ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์ •์ œ๋˜๊ณ , robustํ•ด์ง

    • ํŒŒ์ธํŠœ๋‹ ํšจ๊ณผ
      • Unsupervised ๋ฐฉ์‹ โ†’ ๋‹ค์–‘ํ•œ ํ† ํฐ representation ์ƒ์„ฑ์„ ํ†ตํ•ด entropy ์ฆ๊ฐ€
      • ๋ฐฉ์‹์— ์ƒ๊ด€์—†์ด ํŒŒ์ธํŠœ๋‹ ์ž์ฒด๊ฐ€ invariance ๊ฐœ์„ !
  • Impact of Training Progression
    ํ•™์Šต์ด ์ง„ํ–‰๋˜๋ฉด์„œ ๊ฐ layer์—์„œ์˜ representation์ด ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š”์ง€ ํŒŒ์•…
    • ์ดˆ๊ธฐ layer : ํ•™์Šต ์ง„ํ–‰๋ฅ ๊ณผ ๊ด€๊ณ„์—†์ด representation์ด ์•ˆ์ •๋จ
    • ์ค‘๊ฐ„ layer : ํ•™์Šต์ด ์ง„ํ–‰๋ ์ˆ˜๋ก entropy ๊ฐ์†Œ & LiDAR score ์ตœ์ €์ 
      • ๊ณก๋ฅ ์€ smoothํ•ด์ง โ†’ ํ•™์Šตํ•˜๋ฉด์„œ ์ „์—ญ์ ์ธ ์˜๋ฏธ ๊ตฌ์กฐ๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ ์‹œ์ž‘ํ•จ!
  • Impact of CoT Finetuning
    CoT ํŒŒ์ธํŠœ๋‹ ์ ์šฉ ์‹œ entropy๋Š” ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š” ์ง€ ํŒŒ์•…
    • ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์ด ๋” ๋†’์€ entropy๋ฅผ ๋ณด์ž„
      • ๋ชจ๋ธ์ด ๋ฌธ์ œ๋ฅผ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆ„์–ด ํ‘œํ˜„ํ•˜๋ ค๊ณ  ์‹œ๋„ํ•˜๊ณ , ์ด์— ๋”ฐ๋ผ ํ‘œํ˜„ ์ž์ฒด๊ฐ€ ๋„“๊ณ  ๋‹ค์–‘ํ•ด์ง€๊ธฐ ๋•Œ๋ฌธ
  • Extreme Input Conditions
    ๋น„์ •์ƒ์ ์ธ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์ฃผ์–ด์งˆ ๋•Œ ๋ชจ๋ธ ๋ฐ˜์‘ ํ™•์ธ

    (Random ํ…์ŠคํŠธ & ํ…์ŠคํŠธ ๋ฐ˜๋ณต)

    • Repetition โ†’ ์˜๋ฏธ ์—†๋Š” ๋ฐ˜๋ณต์„ ์ฆ‰์‹œ ๊ฐ์ง€ํ•˜๊ณ , ์ค‘๊ฐ„ layer์—์„œ ํ‘œํ˜„์„ ๋‹จ์ˆœํ™”/๋ฌด์‹œํ•จ
    • Random ํ…์ŠคํŠธ โ†’ ์ดˆ๊ธฐ layer๊ฐ€ ๋…ธ์ด์ฆˆ์— ๋ฏผ๊ฐํ•˜์—ฌ ๋ฌด์ž‘์œ„ input์— ํ‘œํ˜„ ๋‹ค์–‘์„ฑ ์ฆ๊ฐ€์‹œํ‚ด โ†’ ์ค‘๊ฐ„ layer์—์„œ๋Š” ์••์ถ•๋จ
    • Prompt ๊ธธ์ด๊ฐ€ ๊ธธ์ˆ˜๋ก ํ‘œํ˜„ ์••์ถ• ํšจ๊ณผ๋Š” ๋ฏธ๋ฏธํ•ด์ง

  • Comparison to Vision Transformers
    ๋น„์ „ ๋ชจ๋ธ๋„ ๊ฒฝํ–ฅ์„ฑ์ด ์œ ์‚ฌํ•œ๊ฐ€?
    • Autoregressive Image Model (AIM) (์œ„ figure์—์„œ ํŒŒ๋ž‘, ์ฃผํ™ฉ) : GPT์ฒ˜๋Ÿผ ์ˆœ์ฐจ ์˜ˆ์ธก ์ง„ํ–‰ํ•˜๋Š” ๋ชจ๋ธ โ†’ Pythia์™€ ๊ฐ™์€ decoder-only Transformer ์ฒ˜๋Ÿผ ์ค‘๊ฐ„ layer์—์„œ entropy๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์Œ
    • ๋ฐ˜๋ฉด ViT (ํ•‘ํฌ)์™€ ๊ฐ™์ด non-autoregressive ๋ฐฉ์‹์˜ ํ•™์Šต์„ ํ™œ์šฉํ•˜๋Š” ๋ชจ๋ธ์€ steadyํ•œ ๊ณก์„ ์„ ๋ณด์ž„ (BEiT ์ œ์™ธ)
      • Non-autoregressive ๋ชจ๋ธ์€ ์ค‘๊ฐ„์—์„œ ์ •๋ณด๋ฅผ ์••์ถ•ํ•  ํ•„์š”๊ฐ€ ์ ๋‹ค!

    โ†’ ๊ฒฐ๋ก  : ํ•™์Šต ๋ฐฉ์‹ (autoregressive ์—ฌ๋ถ€)์˜ ์ฐจ์ด๊ฐ€ ํ‘œํ˜„๋ ฅ ์ฐจ์ด๋ฅผ ์ด๋Œ์–ด๋‚ธ๋‹ค!

Categories

research