27 March 2026

Inside-Out: Hidden Factual Knowledge in LLMs

๐Ÿ’กLLM์ด ๋จธ๋ฆฟ์†์œผ๋กœ๋Š” ์•„๋Š”๋ฐ ๋ง๋กœ ๋‹ค ํ‘œํ˜„์„ ๋ชปํ•œ๋‹ค!

๐Ÿฅ‡

Inside-Out: Hidden Factual Knowledge in LLMs

Review

๋‹‰๋„ค์ž„ Strength, Weakness, Suggestion๋ณ„์  (0/5)
๋ฉ์ฟ ๋ฆผ๋ณด ๋งํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์•„๋Š”๊ฒŒ ๋” ๋งŽ์€ ๊ฒƒ์€ ์ง๊ด€์ ์œผ๋กœ ์•Œ๊ธฐ ์‰ฌ์šฐ๋‚˜, LLM ๋‚ด๋ถ€ ์ง€์‹์— ๋Œ€ํ•ด ์ •๋Ÿ‰ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์ง€ํ‘œ๋ฅผ ์ •์˜ํ•˜๋Š” ๊ฒƒ์€ ํฐ contribution์ž„. ๋‹ค๋งŒ ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋‚ด๋ถ€/์™ธ๋ถ€ ํ•จ์ˆ˜๊ฐ€ ๋งค์šฐ ์ ์€ ๊ฒƒ์€ ์ด ์—ฐ๊ตฌ์˜ ์™„์„ฑ๋„๋ฅผ ์กฐ๊ธˆ ๋‚ฎ์ถค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์ƒ์„ฑํ˜• ๋ฌธ์ œ๋ณด๋‹ค TF ๋ฌธ์ œ๋ฅผ ๋” ์ž˜ ํ‘ธ๋Š” ๊ฒƒ์€ ์‹ ์„ ํ•œ ๊ฒฐ๊ณผ์˜€์Œ. ์ƒ์„ฑ์„ ๋ชปํ•˜๋Š” ์ง€์‹์— ๋Œ€ํ•ด์„œ ์–ด๋–ค ๊ฐ€์น˜์™€ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋Š”์ง€ ํ”„๋กœ๋น™ ํ•ด๋ณด๋Š” ๊ฒƒ๋„ ์žฌ๋ฐŒ์„ ๋“ฏ ํ•จ. 4
thumps-up โ€ข ์žฅ: LLM์˜ ๋‚ด๋ถ€ ์ง€์‹์ด '์‹ค์žฌ'ํ•˜๋Š” ๊ฑธ ๋ฐํž˜. family ๋ณ„ ๋‚ด๋ถ€์ง€์‹์— ๊ด€ํ•œ ๊ฒฝํ–ฅ๋„ ๋ฐํž˜. ์•„์ด๋””์–ด๋ถ€ํ„ฐ ์‹คํ—˜ ์„ค๊ณ„๊ฐ€ ๋…ผ๋ฆฌ์ ์ด๋„ ํƒ€๋‹นํ•จ
โ€ข ๋‹จ&๋ณด์™„: gold answer๋ฅผ ๋ณด๊ณ  ์ •๋‹ต์ด๋ผ๊ณ  ํŒ๋‹จํ•˜๋Š” ๊ฒƒ ๋ง๊ณ , ์˜ค๋‹ต์„ ์คฌ์„๋•Œ๋„ ์˜ค๋‹ต์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์„๊นŒ?
4.5
ํŒŒ์ด์–ด โ€ข ์žฅ์ : LLM์—์„œ ํ‹€๋ฆฌ๋Š” ๋ถ€๋ถ„์ด ์žˆ์–ด๋„ ์‹ค์ œ LLM์ด ์•„๋Š” ์ง€์‹์ž„์„ ์ฆ๋ช…ํ–ˆ๋‹ค๋Š” ์ , ์‹คํ—˜ ์„ค๊ณ„๊ฐ€ ๋…ผ๋ฆฌ์ ์ด์—ˆ๋‹ค๋Š” ์ ์—์„œ ์ด ๋…ผ๋ฌธ์˜ ๊ธฐ์—ฌ๊ฐ€ ํผ.
โ€ข ๋‹จ์ : LLM์ด ์‹ค์ œ ๋” ๋งŽ์ด ์•„๋Š”๋ฐ ํ‘œํ˜„์„ ๋ชปํ•˜๋Š” ์ด์œ ์— ๋Œ€ํ•œ ์„ค๋ช…์ด ๋ถ€์กฑํ•จ.
โ€ข ๋ณด์™„: LLM์ด ํ‘œํ˜„์„ ๋ชปํ•˜๋Š” ์ด์œ ์— ๋Œ€ํ•œ ์ฆ๋ช…, ๊ทธ๋ฆฌ๊ณ  ์ด๊ฑธ ํ‘œํ˜„์œผ๋กœ ๋“œ๋Ÿฌ๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์‹คํ—˜์ด ์ถ”๊ฐ€๋˜์—ˆ์œผ๋ฉด ํ•จ.
3.9
์›ƒ์œผ๋ฉด์„œ ๋ณด์ž์žฅ์ : llm ์ด ์•Œ์•„๋„ ํ‹€๋ฆฌ๊ฑฐ๋‚˜ ๋Œ€๋‹ต ๋ชปํ•˜๋Š” ๊ฒƒ์€ ์‚ฌ๋žŒ๋„ ๊ทธ๋ ‡๊ณ , ๊ฝค๋‚˜ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ฏ€๊ปด์ง€๋Š” ๊ฒƒ ๊ฐ™์Œ. Cot๋„ ๊ฒฐ๊ตญ ์ด๋Ÿฌํ•œ ๋‚ด๋ถ€ ์ง€์‹์„ ๋Œ์–ด๋‚ด๊ณ  ์“ฐ๊ฒŒํ•˜๋Š” ํ•˜๋‚˜์˜ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ์ƒ๊ฐํ•จ. ๋ฐœ์ƒ์˜ ์ „ํ™˜๊ณผ ๊ด€์ ์˜ ์žฌ์ •๋ฆฝ์ด ์ข‹๋‹ค. ์žฌ๋ฐŒ๊ฒŒ ์ฝ์—ˆ๊ณ , ๊ฐ€์žฅ ํ›„์† ์—ฐ๊ตฌํ•ด๋ณด๊ณ  ์‹ถ์€ ์—ฐ๊ตฌ์ž„
๋‹จ์  ๋ฐ ๋ณด์™„์ : ๊ด€์ ์ด ์™„์ „ํžˆ ์ƒˆ๋กญ์ง€๋Š” ์•Š๋‹ค๊ณ  ์ƒ๊ฐํ•จ. ๊ทธ๋ฆฌ๊ณ  ์™œ ๋ฐœ์ƒํ•˜๋Š”์ง€๋ฅผ ๋ชจ๋ฅด๊ฒ ์Œ. ์‚ฌ๋žŒ์œผ๋กœ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์‚ฌ์‹ค ์ƒ๊ฐ์ด ๋„ˆ๋ฌด ๋งŽ๊ฑฐ๋งˆ, ๋‹ค๋ฅธ ์ƒ๊ฐํ•˜๊ฑฐ๋‚˜, ์งˆ๋ฌธ์„ ์ดํ•ดํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ํƒœ๋ฐ˜. ์ด๋Ÿฐ ๊ทผ๊ฑฐ์™€ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ์œผ๋ฉด ์–ด๋–จ๊นŒ?
4.2
๋…์ˆ˜๋ฆฌ์˜คํ˜•์ œ โ€ข ๊ฐ•์ : ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ œ๊ธฐ๋Š” ๋งŽ์ด ๋ด์˜ค๊ธฐ๋Š” ํ–ˆ๋Š”๋ฐ, external knowledge์™€ internal knowledge๋ฅผ ๋ถ„๋ฆฌํ•ด ์ •์˜ํ•˜๊ณ , hidden-state ๊ธฐ๋ฐ˜ scoring์œผ๋กœ ์ˆจ๊ฒจ์ง„ factual signal์„ ์ธก์ •ํ•˜๋ ค๋Š” ๊ธฐ๋ฒ•์„ ์ž˜ ์ œ์‹œํ•จ
โ€ข ์•ฝ์ : ๊ทธ๋ž˜์„œ ์™œ ์ด ์ง€์‹๋“ค์„ ๋ชป ๊บผ๋‚ด๋Š”์ง€ ๊ถ๊ธˆํ•จ
โ€ข ๋ณด์™„/์ œ์•ˆ: ๋งŒ์•ฝ ์ด ์›์ธ์„ ์•Œ๋ฉด, ๋ชจ๋ธ์ด ๋‚ด๋ถ€ ์ง€์‹์„ ๋” ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ. ํ•œํŽธ์œผ๋กœ๋Š” ํ˜„์žฌ ํ‘œ์ถœ๋˜์ง€ ์•Š๋Š” ์ด์œ ๋Š” ๋ชจ๋ธ์ด ํ•ด๋‹น ์ง€์‹์„ ์ƒ๋Œ€์ ์œผ๋กœ ๋œ ํ•„์š”ํ•˜๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๊ธฐ ๋–„๋ฌธ์ผ์ˆ˜๋„ ์žˆ์„๋“ฏ
4.2
ํŒ์ฝ˜โ€ข ์žฅ์ : ๊ธฐ์กด์— ์•”๋ฌต์ ์œผ๋กœ ์ธ์ •๋˜๋˜ internal knowledge์˜ ์กด์žฌ๋ฅผ ์‹คํ—˜๊ฒฐ๊ณผ๋กœ ๋ช…ํ™•ํžˆ ๋ณด์ž„
โ€ข ๋‹จ์ : K*์˜ ์ •์˜๊ฐ€ ๋„ˆ๋ฌด ์—„๊ฒฉํ•ด์„œ ์˜คํžˆ๋ ค ์‹คํ—˜์— ๋ฐฉํ•ด๋˜๋Š” ์„ค์ •์ผ ์ˆ˜ ์žˆ์–ด๋ณด์ž„
โ€ข ๋ณด์™„์ : internal knowledge๋ฅผ ์–ด๋–ป๊ฒŒ ๋” ์ž˜ ๊บผ๋‚ด๋„๋ก ๋งŒ๋“ค ์ˆ˜ ์žˆ์„์ง€
4.3
๋ˆˆ๋ฌผ โ€ข ๊ฐ•์  : ๋ชจ๋ธ์ด ์ž˜ ๋ชป๋งž์ถ˜๋‹ค = ๋ชจ๋ธ์ด ์•„๋Š”๊ฒŒ ์—†๋‹ค ๋ผ๋Š” ๊ณ ์ •๊ด€๋…(?)์„ ํ‹€์–ด๋ฒ„๋ฆฌ๊ฒŒ ํ•ด์ค€ ๊ฒƒ ๊ฐ™์Œ. ์ด๋ฏธ ๊ฐ–๊ณ  ์žˆ๋Š” ์ง€์‹์„ ์‹ค์ œ๋กœ ๊บผ๋‚ด๋Š” ๊ฒƒ(ํ‘œํ˜„, ์ถœ๋ ฅ)์„ ๋ชปํ•˜๋Š” ๋ถ€๋ถ„์„ ์–ธ๊ธ‰ํ•˜๋ฉฐ, internal > external์„ ๊ฒ€์ฆํ•จ.
โ€ข ์•ฝ์  & ๋ณด์™„์  : external vs internal ๊ฒฐ๊ณผ๋กœ ๋‚ด๋ถ€ ์ง€์‹์ด ์กด์žฌํ•˜๋Š” ๊ฒƒ์„ ์ฃผ์žฅํ•˜์ง€๋งŒ, external ๋น„๊ต ์„ธํŒ…์ด ๋” ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์Œ. (๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค๋กœ ์ถ”๊ฐ€ ๋น„๊ต ์‹คํ—˜)
3.9
ํ”ผ๋•€ โ€ข ๊ฐ•์ : ๊ธฐ์กด์— ๋ง‰์—ฐํžˆ ๊ทธ๋Ÿด ๊ฒƒ์ด๋‹ค๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋˜ ์ง€์ ์„ ์‹คํ—˜์„ ํ†ตํ•ด์„œ ์‹ค์ œ๋กœ internal knowledge๊ฐ€ ์žˆ์Œ์„ ์ž…์ฆํ•จ
โ€ข ์•ฝ์ : ์ข€ ๋” ๋‹ค์–‘ํ•œ LLM๊ณผ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์‹คํ—˜์„ ํ–ˆ์„ ๋•Œ์˜ ์‹คํ—˜๊ฒฐ๊ณผ๊ฐ€ ๊ถ๊ธˆํ•จ
โ€ข ๋ณด์™„์ : internal knowledge์˜ ๋ฐœ๊ฒฌ์—์„œ ๊ทธ์น˜์ง€ ์•Š๊ณ  ์ด๊ฑธ ๋„์ง‘์–ด๋‚ด๋Š” ํ›„์†์—ฐ๊ตฌ๊ฐ€ ๊ธฐ๋Œ€๋œ๋‹ค
4.1
์ดˆ์ฝœ๋ฆฟ โ€ข ์žฅ์ : LLM์ด ํ‹€๋ฆฐ๋‹ค๊ณ  ํ•ด์„œ ๋ชจ๋ฅธ๋‹ค๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ๋Š” ๊ฑธ hidden state ๊ธฐ๋ฐ˜ ์‹คํ—˜์œผ๋กœ ์ฆ๋ช…ํ•˜๊ณ  ์‹ค์ œ ์ˆ˜์น˜๋กœ ๋ณด์—ฌ์คŒ.
โ€ข ์•ฝ์ : ๋‚ด๋ถ€ ์ง€์‹์ด ์กด์žฌํ•œ๋‹ค๋Š” ๊ฑด ๋ณด์—ฌ์คฌ๋Š”๋ฐ, ์™œ ๋ชจ๋ธ์ด ์•Œ๊ณ  ์žˆ๋Š” ๊ฑธ ๋ฐ–์œผ๋กœ ๊บผ๋‚ด์ง€ ๋ชปํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ๋ถ„์„์ด ์—†์Œ
โ€ข ๋ณด์™„์ : ๋ชจ๋ธ ํฌ๊ธฐ๋ณ„๋กœ ๋‚ด๋ถ€ ์ง€์‹๊ณผ ์™ธ๋ถ€ ํ‘œํ˜„์˜ ์ฐจ์ด๊ฐ€ ์–ด๋–ป๊ฒŒ ๋‹ฌ๋ผ์ง€๋Š”์ง€ ๋ถ„์„
4.0
์‚์งˆ โ€ข ์žฅ์ : LLM์ด ํ‹€๋ฆฌ๋Š” ์ด์œ ๊ฐ€ ์ง€์‹์ด ๋ถ€์กฑํ•ด์„œ๊ฐ€ ์•„๋‹ˆ๋ผ ์ด๊ฑธ ๋“œ๋Ÿฌ๋‚ด๋Š” ํ‘œํ˜„ ๋ฐฉ์‹์˜ ๋ฌธ์ œ๋ผ๋Š” ๊ฒƒ์„ hidden state ๊ด€์ ์—์„œ ๋ถ„์„ํ•จ
โ€ข ์•ฝ์ : Probing์„ ํ•œ๋‹ค๋Š”๊ฒŒ hidden state์—์„œ ํŠน์ • ๋ฐฉํ–ฅ์„ ์ฐพ์•„์„œ ๊ทธ๊ฑธ๋กœ ์ •๋‹ต์„ ๊ตฌ๋ณ„ํ•˜๋Š” ๋А๋‚Œ์ธ๋ฐ ์ด๊ฒŒ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋Š” ์ง€์‹์„ ๋„์ถœํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ classifier๊ฐ€ ์ž˜ ์„ ๋ณ„ํ–ˆ๋‹ค!๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์ง€๋„ ์•Š์„๊นŒ
โ€ข ๋ณด์™„์ : Internal knowledge ์˜ ๋ฒ”์œ„๋ฅผ ๊บผ๋‚ผ ์ˆ˜ ์žˆ๋Š” ์ง€์‹์œผ๋กœ ํ™•์žฅ
4.0

TL; DR

๐Ÿ’ก

LLM์ด ๋จธ๋ฆฟ์†์œผ๋กœ๋Š” ์•„๋Š”๋ฐ ๋ง๋กœ ๋‹ค ํ‘œํ˜„์„ ๋ชปํ•œ๋‹ค!

Cited: 30

Summary

Motivation

  • LLM์ด ์–ด๋–ค ์ง€์‹์„ โ€œ์•ˆ๋‹คโ€๋Š” ๊ฒƒ์€ ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š” ๊ฑธ๊นŒ?
    • LLM์ด ์ถœ๋ ฅ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค, ๋” ๋งŽ์€ ์ง€์‹์„ ์•Œ๊ณ  ์žˆ์ง€๋Š” ์•Š์„๊นŒ?
      • ๋งŒ์•ฝ ๊ทธ๋ ‡๋‹ค๋ฉด, LLM inference์— ๋Œ€ํ•ด ๊ฐœ์„ ํ•ด์„œ LLM์˜ ํผํฌ๋จผ์Šค ์˜ฌ๋ฆด ์ˆ˜ ์žˆ์Œ
      • ๋˜, inference ๊ณผ์ •์—์„œ ์–ด๋–ค ์ง€์‹์ด ์™œ ์“ฐ์ด๊ณ  ์™œ ์“ฐ์ด์ง€ ์•Š๋Š”์ง€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Œ

Contribution

  • LLM์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” Hidden Knowledge์— ๋Œ€ํ•ด ์ •์˜ํ•˜๊ณ  ์‹คํ—˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•จ
    • Definition: Existence of an internal function that ranks answers more accurately than any external function
    • ์–ด๋–ค ๋‚ด๋ถ€ ํ•จ์ˆ˜(hidden state ํ™œ์šฉ)๋กœ ๋ชจ๋“  ์™ธ๋ถ€ ํ•จ์ˆ˜(ํ† ํฐ ์ƒ์„ฑ ํ™•๋ฅ  ํ™œ์šฉ)๋ณด๋‹ค ๋‹ต๋ณ€์„ ๋” ์ž˜ rankingํ•  ์ˆ˜ ์žˆ์œผ๋ฉด ๋‚ด๋ถ€ ์ง€์‹์€ ์กด์žฌํ•œ๋‹ค!
  • ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ์ง€์‹์˜ ์ •์˜๋Š”
    • ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•จ โ†’ ์‹คํ—˜์  ์—ฐ๊ตฌ์— ์œ ์šฉ
    • Closed-book QA ์—์„œ์˜ ๋‹ต๋ณ€ ์ •ํ™•๋„๋กœ ํ‰๊ฐ€ํ•˜์ง€ ์•Š์Œ โ†’ ํ‘œ๋ฉด์ ์ด ์•„๋‹Œ ๋‚ด์žฌ์  ์ง€์‹์— ์ง‘์ค‘
    • ์™ธ๋ถ€์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ์ง€์‹๊ณผ ๋‚ด๋ถ€์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•œ ์ง€์‹์„ ํ†ต์ผ๋œ ์ •์˜ ํ•˜์—์„œ ์ธก์ •ํ•จ
  • Findings
    • Hidden knowledge๋Š” ์‹ค์žฌํ•จ!
      • ์ด๋•Œ ๋ฐœ๊ฒฌํ•œ ๋‚ด๋ถ€ํ•จ์ˆ˜๋กœ Closed-book QA ์„ฑ๋Šฅ ๊ฐœ์„ ํ•จ
    • ์ƒ๊ฐ๋ณด๋‹ค ๊นŠ๊ฒŒ ์ˆจ๊ฒจ์ ธ ์žˆ์„ ์ˆ˜ ์žˆ๊ณ , ๊ทธ๋Ÿฌ๋ฉด ๋ชจ๋ธ ์ƒ์„ฑ์— ์ „ํ˜€ ๋ฐ˜์˜์ด ์•ˆ๋จ
      • ํ•„์ž์˜ ๋‡Œํ”ผ์…œ) Safety alignment๊ฐ€ ๋˜๋ฉด ์œ ํ•ดํ•œ ์ง€์‹์— ๋Œ€ํ•ด ์•Œ๊ณ  ์žˆ์–ด๋„ ์ƒ์„ฑ ์•ˆ ํ•จ

Study Design

  • ๋จผ์ € ์–ด๋–ค ์ง€์‹์„ ์“ธ ์ง€, ์–ด๋–ป๊ฒŒ ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ์ง€์‹์„ ์ธก์ •ํ• ์ง€ ์ •์˜ํ•˜์ž!

Hidden Knowledge Definition

  • ํŠธ๋ฆฌํ”Œ๋กœ ํ‘œํ˜„๋˜๋Š” ์ง€์‹์— ์ดˆ์ ์„ ๋‘ 
    • E.g. (โ€œEmpire State Buildingโ€, location, โ€œNYCโ€)
    • ์ด ๋•Œ Where is the Empire State Building located? ์— ๋Œ€ํ•ด์„œ NYC , New York City
      ๋ชจ๋‘ ๊ทธ๋Ÿด๋“ฏํ•œ ๋‹ต๋ณ€๋“ค!
  • ๋‹ต๋ณ€์ด ๋‹ค์–‘ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, โ€˜์•Žโ€™์„ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด scoring method์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋ชจ๋ธ ๋‚ด๋ถ€ ์ง€์‹์„ ์ •์˜ํ•จ!
  • Definition 1 (Knowledge of a Model w.r.t a Scoring Method)
    • Notation
      • MM๏ปฟ: LLM
      • (s,r,o)(s, r, o)๏ปฟ: ์‚ฌ์‹ค, ์ง€์‹
        • E.g. (โ€œFranceโ€, capital, โ€œParisโ€)
      • Q(s,r)Q(s,r)๏ปฟ: (s,r)(s, r)๏ปฟ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, oo๏ปฟ๋ฅผ ๋ฌป๋Š” ๋ชจ๋“  ์งˆ๋ฌธ์˜ paraphrase ์ง‘ํ•ฉ
        • E.g. โ€œWhat is the capital of France?โ€, โ€œWhich city is the capital of France?โ€
      • A~(o)\tilde{A}(o)๏ปฟ: Q(s,r)Q(s,r)๏ปฟ์— ๋Œ€ํ•œ ๋ชจ๋“  ๊ทธ๋Ÿด๋“ฏํ•œ ๋‹ต๋ณ€(oo๏ปฟ๋ž‘ ๊ฐ™์€ ํƒ€์ž…์˜ ์—”ํ‹ฐํ‹ฐ๋“ค) ์ง‘ํ•ฉ
        • E.g. โ€œParisโ€, โ€œThe city of New Yorkโ€
      • A(o)โІA~(o)A(o) \subseteq \tilde{A}(o)๏ปฟ: ์ •๋‹ต oo๏ปฟ์˜ paraphrase ์ง‘ํ•ฉ
        • E.g. โ€œParisโ€, โ€œThe city of Parisโ€
      • ฮฉ(s,r,o):=A(o)ร—(A~(o)โˆ–A(o))ฮฉ(s,r,o):=A(o)\times(\tilde{A}(o)\setminus A(o))๏ปฟ: Q(s, r)์— ๋Œ€ํ•œ ์ •๋‹ต ํ•˜๋‚˜์™€ ๊ทธ๋Ÿด๋“ฏํ•œ ์˜ค๋‹ต ํ•˜๋‚˜๋กœ ์ด๋ฃจ์–ด์ง„ ์ˆœ์„œ์Œ
        • E.g. (โ€œParisโ€, โ€œLondonโ€), (โ€œParis cityโ€, โ€œNYCโ€)
    • ๊ทธ๋Ÿด๋“ฏํ•œ ์˜ค๋‹ต๋ณด๋‹ค ์ •๋‹ต์— score๋ฅผ ๋” ๋†’๊ฒŒ ์ฃผ๋Š” ๋Šฅ๋ ฅ์„ Kโˆผ(0,1)K\sim(0,1)๏ปฟ๋กœ ๋‚˜ํƒ€๋ƒ„!
      • QA ์Œ (q,a)(q, a)๏ปฟ์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜
        • Kq(s,r,o;SM)=1โˆฃฮฉ(s,r,o)โˆฃโˆ‘(a,a~)โˆˆฮฉ(s,r,o)I(SM(q,a)>SM(q,a~))K_q(s,r,o;S_M)=\frac{1}{|\Omega(s,r,o)|}\sum_{(a,\tilde{a})\in \Omega(s,r,o)} I\big(S_M(q,a)>S_M(q,\tilde{a})\big)๏ปฟ
      • (s,r,o)(s, r, o)๏ปฟ์— ๋Œ€ํ•œ ๋ชจ๋ธ M์˜ ์ง€์‹ ์ •๋„๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜
        • K(s,r,o;SM)=1โˆฃQ(s,r)โˆฃโˆ‘qโˆˆQ(s,r)Kq(s,r,o;SM)K(s,r,o;S_M)=\frac{1}{|Q(s,r)|}\sum_{q\in Q(s,r)} K_q(s,r,o;S_M)๏ปฟ
        • ํ•œ ํŠธ๋ฆฌํ”Œ์— ๋Œ€ํ•ด ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ qq๏ปฟ์— ๋Œ€ํ•œ ๊ฒƒ
      • ๋ชจ๋ธ์ด ์ง„์งœ ์ œ๋Œ€๋กœ ์™„์ „ํžˆ ์•Œ๊ณ  ์žˆ์„ ๋•Œ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด Kโˆ—K^*๏ปฟ์ •์˜
        • Kโˆ—(s,r,o;SM)=I(K(s,r,o;SM)=1)K^*(s,r,o;S_M)= I\big(K(s,r,o;S_M)=1\big)๏ปฟ
        • ๋ชจ๋“  ์ •๋‹ต ์˜ค๋‹ต ์Œ์— ๋Œ€ํ•ด ์ •๋‹ต์— ์ ์ˆ˜๋ฅผ ๋” ๋งŽ์ด ์ค€ ๊ฒฝ์šฐ์ž„!
      • K๊ฐ€ 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ํ•ด๋‹น ํŠธ๋ฆฌํ”Œ์— ๋Œ€ํ•ด ๋šœ๋ ทํ•˜๊ฒŒ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒƒ!
  • Definition 2 (Evidence of Hidden Knowledge)
    • Notation
      • TMT_M๏ปฟ: ๋‚ด๋ถ€ scoring ํ•จ์ˆ˜
      • SMES^E_M๏ปฟ: ๋ชจ๋“  ์™ธ๋ถ€ scoring ํ•จ์ˆ˜์˜ ์ง‘ํ•ฉ
      • D={(si,ri,oi)}i=1nD=\{(s_i,r_i,o_i)\}_{i=1}^{n}๏ปฟ: ๋ฐ์ดํ„ฐ์…‹
      • ฮ”\Delta๏ปฟ: ๋ณ€๋™์„ ๋ฐฐ์ œํ•  ์ˆ˜ ์žˆ๋Š” ๋งˆ์ง„(๋„‰๋„‰ํ•˜๊ธฐ surpassํ•จ์„ ๋ณด์ด๊ธฐ ์œ„ํ•จ)
    • Hidden state์™€ ๊ฐ™์€ ๋‚ด๋ถ€ ์ •๋ณด๋กœ scoringํ•œ ๋‚ด๋ถ€ ํ•จ์ˆ˜๊ฐ€ ์™ธ๋ถ€์  ์ •๋ณด๋งŒ ํ™œ์šฉํ•œ ๋ชจ๋“  scoringํ•จ์ˆ˜๋ณด๋‹ค ๋„‰๋„‰ํžˆ ํด ๋•Œ! ๋‚ด๋ถ€ ์ง€์‹์€ ์กด์žฌํ•œ๋‹ค
      • 1nโˆ‘i=1nK(si,ri,oi;TM)>maxโกSMโˆˆSME(1nโˆ‘i=1nK(si,ri,oi;SM))+ฮ”\frac{1}{n}\sum_{i=1}^{n} K(s_i,r_i,o_i;T_M) > \max_{S_M\in S_M^E}\left( \frac{1}{n}\sum_{i=1}^{n} K(s_i,r_i,o_i;S_M) \right)+\Delta๏ปฟ
  • Scoring fuction
    • ์™ธ๋ถ€ ํ•จ์ˆ˜๋Š” input qq๏ปฟ์— ๋Œ€ํ•ด ๋‹ต๋ณ€ aa๏ปฟ ์ƒ์„ฑ ํ™•๋ฅ ์ด๋‚˜, (q,a)(q, a)๏ปฟ์Œ์„ ์ฃผ๊ณ  TF๋ฅผ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์Œ
    • ๋‚ด๋ถ€ ํ•จ์ˆ˜๋Š” hidden state๋ฅผ ์ด์šฉํ•ด์„œ probing classifier๋ฅผ ์“ธ ์ˆ˜ ์žˆ์Œ

Experiment design

  • Knowledge dataset
    • EntityQuestions (Wikidata์—์„œ ํŠธ๋ฆฌํ”Œ์„ QA๋กœ ๋งŒ๋“  ๊ฒƒ)
    • ์–ด๋ ต์ง€๋งŒ ๋ช…ํ™•ํ•œ relation์œ„์ฃผ๋กœ ์ถ”์ถœ
      • P26(spouse), P176(manufacturer), P264(record label), P50(author)
  • ๋‚ด๋ถ€ ์ง€์‹ ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ์š”์ธ ์„ธํŒ…
    • SMES^E_M๏ปฟ
      • ๋‹ต๋ณ€ ์ƒ์„ฑ ํ™•๋ฅ ๋กœ ํ‰๊ฐ€
        • P(aโˆฃq)=โˆi=1nP(aiโˆฃq,a<i)P(a\mid q)=\prod_{i=1}^{n} P(a_i \mid q,a_{<i})๏ปฟ
        • ๊ธธ์ด ์ •๊ทœํ™” ๋ฒ„์ „:
          • Pnorm(aโˆฃq)=(โˆi=1nP(aiโˆฃq,a<i))1/n=expโก(1nโˆ‘i=1nlogโกP(aiโˆฃq,a<i))P_{norm}(a\mid q)=\left(\prod_{i=1}^{n} P(a_i \mid q,a_{<i})\right)^{1/n} =\exp\left(\frac{1}{n}\sum_{i=1}^{n}\log P(a_i \mid q,a_{<i})\right)๏ปฟ
      • TF ๋ฌธ์ œ๋กœ ํ‰๊ฐ€
        • P(โ€œTrueโ€โˆฃq,a)P(โ€œTrueโ€โˆฃq,a)๏ปฟ
    • A~(o)\tilde{A}(o)๏ปฟ: ๊ทธ๋Ÿด๋“ฏํ•œ ๋‹ต๋ณ€ ์ง‘ํ•ฉ ์ƒ์„ฑ์„ ์œ„ํ•ด LLM์—๊ฒŒ qq๏ปฟ์ฃผ๊ณ  1000๋ฒˆ ์ƒ˜ํ”Œ๋ง+dataset์— ์žˆ๋Š” ground truth
    • A(o)A(o)๏ปฟ: LLM judge๋กœ ground truth oo๏ปฟ๋ž‘ LLM์ด ์ƒ์„ฑํ•œ ๋‹ต๋ณ€์ด๋ž‘ ๋งž์œผ๋ฉด ์ •๋‹ต ์ง‘ํ•ฉ์— ์ถ”๊ฐ€
    • Q(s,r)Q(s,r)๏ปฟ: Dataset์— ์žˆ๋Š” ์›๋ž˜ ์งˆ๋ฌธ๋งŒ ์‚ฌ์šฉ
    • ฮ”\Delta๏ปฟ: 0.05
    • TMT_M๏ปฟ: Logistic regression objective, LLM์˜ hidden state๋กœ๋ถ€ํ„ฐ ์ •๋‹ต/์˜ค๋‹ต์„ ๋ถ„๋ฅ˜ํ•˜๋„๋ก ํ•™์Šต
      • ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ถœ๋ ฅํ•˜๋Š” ์ •๋‹ต์ผ ํ™•๋ฅ ์„ scoring fuction์œผ๋กœ ์‚ฌ์šฉ
      • ๊ฐ€์žฅ ์„ฑ๋Šฅ ์ข‹์€ ๋ ˆ์ด์–ด์—์„œ๋งŒ ์‚ฌ์šฉ

Experiment Results

  • ๋‚ด๋ถ€์ง€์‹์€ ์กด์žฌํ•œ๋‹ค!
    • ๋ชจ๋ธ๋ณ„๋กœ ๋‚ด๋ถ€์ง€์‹์„ ์ž˜ ํ‘œํ˜„ํ•˜๋Š” ์ •๋„๋Š” ๋‹ค๋ฆ„
      • ๋ผ๋งˆ๋Š” ์ž˜ํ•˜๊ณ  ์ ฌ๋งˆ๋Š” ์ž˜ ๋ชปํ•จ
    • ์™ธ๋ถ€ ํ•จ์ˆ˜ ์ค‘์—์„œ๋Š” P(True)๊ฐ€ ์ œ์ผ ์ข‹์Œ

      โ†’ LLM์€ ์„œ์ˆ ํ˜•(์ƒ์„ฑ)๋ณด๋‹ค TF๋ฌธ์ œ(๊ฒ€์ฆ)์—์„œ ๋” ๋Šฅ์ˆ™ํ•จ!

  • 1000๋ฒˆ ์ƒ์„ฑํ•ด๋„ ํ‹€๋ฆฌ๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ, ์‚ฌ์‹ค ์•Œ๊ณ  ์žˆ์„ ์ˆ˜ ์žˆ์Œ
    • 1000๋ฒˆ ์ƒ˜ํ”Œ๋ง ๋œ๊ฒƒ๋งŒ ๊ฐ€์ง€๊ณ  scoring ํ•  ๋•Œ (ํšŒ์ƒ‰) vs 1000๋ฒˆ ์ƒ˜ํ”Œ๋ง ๋œ๊ฑฐ์— gold answer ์ฃผ๊ณ  scoringํ•  ๋•Œ์˜ Kโˆ—K^*๏ปฟ
    • ์™ธ๋ถ€์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋ฉด gold ์ค˜๋„ ๊ทธ๊ฒƒ์— ๋Œ€ํ•ด ๋งž๋Š” ์‚ฌ์‹ค์ด๋ผ๊ณ  ์ธ์‹ ๋ชปํ•จ
    • ๋‚ด๋ถ€์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋ฉด gold ์ฃผ๋ฉด ๊ทธ๊ฒƒ์— ๋Œ€ํ•ด ๋งž๋Š” ์‚ฌ์‹ค์ด๋ผ๊ณ  ์ธ์‹ํ•จ
    • ๋‚ด๋ถ€์ ์œผ๋กœ๋Š” ์™„๋ฒฝํžˆ ์•Œ๊ณ  ์žˆ๋Š”๋ฐ ์™ธ๋ถ€์ ์œผ๋กœ ํ‘œํ˜„ ๋ชปํ•˜๋Š” ์งˆ๋ฌธ์ด ์ „์ฒด์˜ 7.2%์ •๋„ ๋จ
  • Case Study
    • Q: Which company is Volvo B58 produced by?
      • ์—ฌ๊ธฐ์„œ ํ•ต์‹ฌ์€ ๋ณผ๋ณด๋Š” ๋ณผ๋ณด๋กœ ๋ณด๊ณ , B58์€ BMW์˜ ์—”์ง„์ด ์•„๋‹ˆ๋ผ ๋ณผ๋ณด์˜ ๋ฒ„์Šค์ž„์„ ์•Œ์•„์•ผ ํ•จ
    • ์™ธ๋ถ€์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋ฉด ์ˆœ์œ„๋ฅผ ์ž˜ ๋ชป๋งค๊ธฐ๋Š”๋ฐ, ๋‚ด๋ถ€ ํ”„๋กœ๋น™์œผ๋กœ ๋ณด๋ฉด ์ˆœ์œ„๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ์ž˜ ๋‚˜๋ˆ”
  • ํฐ ๋ชจ๋ธ(32B)๋„ ๋งˆ์ฐฌ๊ฐ€์ง€์ผ๊นŒ?
    • ์ž‘์€ ๊ทœ๋ชจ๋กœ 32B ๋ชจ๋ธ์—์„œ ํ•ด๋ดค์„ ๋•Œ, ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋‚ด๋ถ€ ํ”„๋กœ๋น™์—์„œ ์ง€์‹์ด ๋” ๋šœ๋ ทํ•จ
  • Voting๋ณด๋‹ค ๋‚ด๋ถ€ ํ”„๋กœ๋น™์œผ๋กœ ๋‹ต๋ณ€ ์ƒ์„ฑํ•˜๋ฉด ๋” ์ž˜ํ•จ
    • Greedy๊ฐ€ ์ƒ๊ฐ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋‚ฎ๊ณ , P(aโˆฃq)P(a|q)๏ปฟ๊ฐ€ greedy๋ณด๋‹ค ๋‚˜์Œ
    • ๋‚ด๋ถ€ probing์—์„œ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ
    • ๋ชจ๋ธ ์Šค์Šค๋กœ๋Š” gold๋ฅผ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•˜๋”๋ผ๋„, gold answer๋ฅผ ์ฃผ๋ฉด, ๋‚ด๋ถ€ probing์„ ํ†ตํ•ด ๊ทธ๊ฒƒ์ด ์ •๋‹ต์ด๋ผ๊ณ  ๋ชจ๋ธ์€ ๋งํ•  ์ˆ˜ ์žˆ์Œ!

Categories

PROBINGresearch