21 January 2026

Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models

๐Ÿ’กLLM์˜ ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ์„ ์ •์˜ํ•˜๊ณ , ๊ด€๋ จ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜์—ฌ, ๋…ผ๋ฆฌ ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ๊ณผ ๋…ผ๋ฆฌ ํƒœ์Šคํฌ ์ˆ˜ํ–‰๋Šฅ๋ ฅ ์ฆ์ง„

Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models

Review

๋‹‰๋„ค์ž„ ํ•œ์ค„ํ‰๋ณ„์  (0/5)
๊ณ„๋ž€์ดˆ๋ฐฅ๊ฐ„๋‹จ๋ช…๋ฃŒํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ uncertanity, faithfulness๋ฅผ ์žก์€ ๋…ผ๋ฌธ! ๋‹จ์ˆœ ๋ถ€์ •๋ฌธ ์ด์™ธ์˜ โ€œlogicโ€ ํ‚ค์›Œ๋“œ๋ฅผ ์ž˜ ํ™œ์šฉํ•˜๋ฉด ๋” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ์„ ์ •์˜ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ 3.6
๋งน๊ตฌLLM์˜ ์˜ˆ์ธก ์ผ๊ด€์„ฑ์€ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•จ. LLM์ด ๋…ผ๋ฆฌ์— ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค๋ผ๋Š” ๋‚ด์šฉ์ด ๊ณ„์† ์ƒ๊ฐ๋‚˜๋Š”๋ฐ, ํ‚ค์›Œ๋“œ๊ฐ€ ๋„ˆ๋ฌด ๋น„์Šทํ•ด์„œ ์‹ ๊ธฐํ–ˆ์Œ. ๋…ผ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ์ฆ๊ฐ•ํ•˜๊ณ  ์ƒ์„ฑํ•˜๊ธฐ์— ์ ํ•ฉํ•œ ๋ฐฉ๋ฒ•์ธ ๊ฒƒ ๊ฐ™๋‹ค. ์ฐธ๊ณ ํ•˜๊ธฐ ์ข‹์„ ๊ฒƒ ๊ฐ™์Œ.3.7
๊ตญ๋ฐฅMeasuring the Inconsistency of Large Language Models in Ordinal Preference Formation ๋…ผ๋ฌธ์—์„œ๋„ 3๊ฐ€์ง€ ๋…ผ๋ฆฌ ๋ถˆ๋ณ€์„ฑ์„ ๊ณ ๋ คํ•˜๋Š”๋ฐ ํ™•์‹คํžˆ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์— ์žˆ์–ด ์ฐจ๋ณ„์ ์ด ์žˆ๋Š”๊ฑฐ๊ฐ™๋‹ค.
๋ถ€์ •ํ˜•์€ ์ƒˆ๋กœ์šด ๋ฌธ์žฅ ๊ตฌ์กฐ๋ผ ๋”ฐ๋กœ ํ›ˆ๋ จ์•ˆํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ฐจ์ด ์—†๋Š”๊ฒƒ๋„ ์ดํ•ด๋จ
3.9
ํ–„๋ฒ„๊ฑฐLLM ์‹ ๋ขฐ์„ฑ์„ ๋”ฐ์งˆ ๋•Œ ์ •ํ™•๋„ ๊ธฐ๋ฐ˜์ด ์•„๋‹Œ ๋…ผ๋ฆฌ์  ์ผ๊ด€์„ฑ์œผ๋กœ ๋ณด๋Š”๊ฒƒ์ด ๋” ํƒ€๋‹นํ•ด ๋ณด์ด๊ธด ํ•จ. CoT๊ฐ€ ํ•ญ์ƒ consistency๋ฅผ ๊ฐœ์„ ํ•˜์ง€ ์•Š๊ณ  ์˜คํžˆ๋ ค reasoning ๊ณผ์ • ์ž์ฒด๊ฐ€ ์˜คํžˆ๋ ค ํŒ๋‹จ์˜ ์•ˆ์ •์„ฑ์„ ํ”๋“ค์ˆ˜ ์žˆ๊ฒ ๊ตฌ๋‚˜3.8
ํ”ผ์žLLM์˜ ๋…ผ๋ฆฌ์  ์ผ๊ด€์„ฑ์ด ํ”๋“ค๋ฆฌ์ง€ ์•Š๋Š”์ง€ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์— ์ด ์—ฐ๊ตฌ์˜ ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ด„. ๋…ผ๋ฆฌ ๊ทธ๋ž˜ํ”„ ๋ฐ Item์˜ ์ˆœ์„œ์— ๋”ฐ๋ฅธ ์ผ๊ด€์„ฑ์„ ๋ถ„์„ํ•œ ๊ฒƒ์ด ์ด ๋…ผ๋ฌธ์˜ ์ฐจ๋ณ„์ ์ด๊ณ  ํ›„์† ์—ฐ๊ตฌ๊ฐ€ ์ข€ ๋” ์ง„ํ–‰๋˜๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Œ.3.9
์น˜ํ‚จ๊ฐœ์ธ์ ์œผ๋กœ llm ์‹ ๋ขฐ์— ์žˆ์–ด์„œ llm์˜ ์ผ๊ด€์„ฑ์ด ๋˜๊ฒŒ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š”๋ฐ 3๊ฐ€์ง€ ์†์„ฑ์„ ํ†ตํ•ด robustness๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ์‹์€ ์„ค๋“๋ ฅ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•จ. ๋˜ ์‹คํ—˜์„ ํ†ตํ•ด ์ผ๊ด€์„ฑ์ด ์˜ฌ๋ผ๊ฐ€๋ฉด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ฑ๋Šฅ๋„ ํ–ฅ์ƒ๋œ๋‹ค๋Š” ์ ์ด ๋…ผ๋ฌธ์˜ ์„ค๋“๋ ฅ์„ ๋†’์˜€๋‹ค๊ณ  ์ƒ๊ฐํ•จ 4.1
ํŽ˜๋ธŒ๋ฆฌ์ฆˆ์ธ๊ฐ„ ์„ ํ˜ธ๋„์— ์ •๋ ฌ๋œ๋‹ค๋Š” ๊ฒŒ ๊ผญ ๋…ผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋˜๊ฒŒ ์ž˜ ์ถ”๋ก ํ•œ๋‹ค๋Š” ๊ฑด ์•„๋‹ˆ๋‹ˆ๊นŒ, ๊ทธ๋Ÿฐ ๋ฉด์—์„œ ์„ ํ˜ธ๋„ ์ •๋ ฌ์˜ ๋ถ€์กฑํ•œ ์ธก๋ฉด์„ ์ž˜ ๋ณด์™„ํ•œ ๋“ฏํ•˜๋‹ค. ๋…ผ๋ฌธ์ด ๋…ผ๋ฆฌ์  ์ผ๊ด€์„ฑ์˜ ์ •์˜๋ถ€ํ„ฐ ์„ค๋ช… ์œ„ํ•œ ๊ทธ๋ฆผ ๋“ฑ ๊น”๋”ํžˆ๊ณ  ๋ช…ํ™•ํ•ด์„œ ์ „๋‹ฌ๋ ฅ ์ข‹์€ ๋…ผ๋ฌธ์ด๋ผ ์ƒ๊ฐํ–ˆ์Œ4.1

TL; DR

๐Ÿ’ก

LLM์˜ ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ์„ ์ •์˜ํ•˜๊ณ , ๊ด€๋ จ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜์—ฌ, ๋…ผ๋ฆฌ ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ๊ณผ ๋…ผ๋ฆฌ ํƒœ์Šคํฌ ์ˆ˜ํ–‰๋Šฅ๋ ฅ ์ฆ์ง„

Summary

Introduction

Motivation

  • ์˜ˆ์ธก ์ผ๊ด€์„ฑ์€ LLM์˜ ์‹ ๋ขฐ์„ฑ์— ์žˆ์–ด ์ค‘์š”ํ•œ ์š”์†Œ
    • ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํŠนํžˆ LLM์˜ ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ์„ ํƒ๊ตฌ
    • ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ์€ ๊ตฌ์กฐํ™”๋œ ์ถ”๋ก ๊ณผ ์ผ๊ด€๋œ ์˜์‚ฌ๊ฒฐ์ •์— ์ค‘์š”

Contribution

  • LLM์˜ ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ์˜ ์ค‘์š”์„ฑ ๊ฐ•์กฐ
    • ์„ธ๊ฐ€์ง€ ํ•ต์‹ฌ ์ผ๊ด€์„ฑ ์†์„ฑ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ์ •์˜
  • ์ตœ์‹  LLM์˜ ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ ํ‰๊ฐ€ํ•˜๊ณ  ๋ชจ๋ธ ์‹ ๋ขฐ์„ฑ๊ณผ ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„
  • ์ง€์‹œ๋ฌธ ํŠœ๋‹์œผ๋กœ ๋ฐ์ดํ„ฐ ์ •์ œํ•˜๊ณ  ์ฆ๊ฐ•ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ œ์•ˆํ•˜์—ฌ ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ ํ–ฅ์ƒ
  • ๋…ผ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํƒœ์Šคํฌ์—์„œ LLM์˜ ๋…ผ๋ฆฌ์  ์ผ๊ด€์„ฑ ๊ฐœ์„ ์ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ธฐ์—ฌํ•จ์„ ์ž…์ฆ

Measuring Logical Consistency

  • ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•: ๋…ผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋œ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋Šฅ๋ ฅ ํ‰๊ฐ€
    • LLM์ด item pair ๋น„๊ตํ•˜์—ฌ item ๊ฐ„ relation ๊ฒฐ์ •ํ•˜๋„๋ก ํ•จ
      F(x1,x2)=rF(x_1, x_2) = r
      • (x1,x2)(x_1, x_2)๏ปฟ : item pair
      • FF๏ปฟ : item pair ๋น„๊ตํ•˜์—ฌ relation ์˜ˆ์ธกํ•˜๋Š” ํ•จ์ˆ˜ (LLM)
      • rr๏ปฟ : ๋ฐฉํ–ฅ์„ฑ ์žˆ๋Š” ๋…ผ๋ฆฌ์  ์„ ํ˜ธ ๊ด€๊ณ„
        • rijr_{ij}๏ปฟ : xiโ‰ปxjx_i \succ x_j๏ปฟ, ์ฆ‰ item xix_i๏ปฟ๊ฐ€ xjx_j๏ปฟ๋ณด๋‹ค ์„ ํ˜ธ๋จ
        • rjir_{ji}๏ปฟ : xjโ‰ปxix_j \succ x_i๏ปฟ
  • item set XX๏ปฟ์— ๋Œ€ํ•ด ๋‹ค์Œ ์†์„ฑ ๊ณ ๋ คํ•˜์—ฌ ๋…ผ๋ฆฌ์  ์ผ๊ด€์„ฑ ํ‰๊ฐ€
    • transitivity: ๋งฅ๋ฝ ๋ณ€ํ™” ์žˆ์„ ๋•Œ์˜ ์ผ๊ด€์„ฑ(๋ณ€ํ™” ์ „ํ›„์˜ ํŒ๋‹จ์ด ๋ชจ์ˆœ๋˜์ง€ ์•Š์Œ)
    • commutativity: ์ˆœ์„œ ๋ณ€๋™ ์žˆ์„ ๋•Œ์˜ ์ผ๊ด€์„ฑ
    • negation invariance: relational negation ํ•  ๋•Œ์˜ ์ผ๊ด€์„ฑ

Measuring Transitivity

  • transitivity: Aโ‰ปBA \succ B๏ปฟ and Bโ‰ปCB \succ C๏ปฟ โ†’ Aโ‰ปCA \succ C๏ปฟ
    • F(xi,xj)=rijF(x_i, x_j) = r_{ij}๏ปฟ and F(xj,xk)=rjkF(x_j, x_k)=r_{jk}๏ปฟ โ†’ F(xi,xk)=rikF(x_i, x_k) = r_{ik}๏ปฟ
    • item set XX๏ปฟ์˜ relational graph์— cycle ์กด์žฌ ์—ฌ๋ถ€๊ฐ€ transitivity ๋‚˜ํƒ€๋ƒ„
      cycle์ด non-transitive ํŠน์„ฑ ๋‚˜ํƒ€๋ƒ„: x_1โ†’x_2, x_2โ†’x_3 ์ธ๋ฐ x_3โ†’x_1
  • metric (0~1 ๊ฐ’ ๊ฐ€์ง)
    • SiKS_i^K๏ปฟ : item set์—์„œ ์ƒ˜ํ”Œ๋ง๋œ ํฌ๊ธฐ KK๏ปฟ์˜ ๋žœ๋ค ํ‘œ๋ณธ ์„œ๋ธŒ๊ทธ๋ž˜ํ”„
    • MM๏ปฟ : ์ƒ˜ํ”Œ๋ง๋œ ์ด ์„œ๋ธŒ๊ทธ๋ž˜ํ”„ ์ˆ˜ (์ตœ๋Œ€ 1,000)
    • KK๏ปฟ(์„œ๋ธŒ๊ทธ๋ž˜ํ”„ ํฌ๊ธฐ) ์„ค์ • ์ด์œ : transitivity ์œ ์ง€ ์–ด๋ ค์šด ์ •๋„์™€ ๊ด€๋ จ๋˜๊ธฐ ๋•Œ๋ฌธ
      • K๊ฐœ item set์—์„œ item pair๋Š” 2K๊ฐœ ์กฐํ•ฉ ์กด์žฌ, ์ด ์ค‘์—์„œ transitive rank๋Š” K! ๊ฐœ ๊ฐ€๋Šฅ
      • ์ด์— ๊ณต์ •ํ•œ ๋น„๊ต ๊ฐ€๋Šฅํ•˜๋„๋ก K๋ฅผ ๊ณ ์ •ํ•˜๊ณ  ์ง€ํ‘œ ์ธก์ •

Measuring Commutativity

  • commutativity: ํ”„๋กฌํ”„ํŠธ ๋‚ด item ์ˆœ์„œ ๋ฐ”๊ฟจ์„ ๋•Œ ๋ชจ๋ธ์˜ ํŒ๋‹จ์ด ์ผ๊ด€๋˜๋Š”์ง€
    • A : xiโ‰ปxjx_i \succ x_j๏ปฟ
    • B : xiโ‰บxjx_i \prec x_j๏ปฟ
    • โ‡’ ๋นจ๊ฐ„์ƒ‰ ์‹ค์„ : commutativity conflict ๋‚˜ํƒ€๋ƒ„
  • metric (0~1 ๊ฐ’ ๊ฐ€์ง)
    • ๋งจ ์•ž์˜ normalization term: ๋ชจ๋“  ์Œ๋ณ„ ์กฐํ•ฉ์— ๋Œ€ํ•ด ํ‰๊ท  ๋ƒ„

Measuring Negation Invariance

  • negation invariance: relational statement๋ฅผ negation ํ˜น์€ inversion ํ•  ๋•Œ ๋ชจ๋ธ ํŒ๋‹จ์ด ์ผ๊ด€๋œ์ง€
    • A, B notation์€ commutativity์™€ ๋™์ผ
    • โ‡’ ๋ณด๋ผ์ƒ‰ ์ ์„ : negation conflict ๋‚˜ํƒ€๋ƒ„
  • metric (0~1 ๊ฐ’ ๊ฐ€์ง)
    • Fโ€พ\overline{F}๏ปฟ : negated relation์ด ๋ช…์‹œ์ ์œผ๋กœ ํ”„๋กฌํ”„ํŒ…๋  ๋•Œ ๋ชจ๋ธ์˜ ํŒ๋‹จ
    • ยฌF\neg F๏ปฟ : ์›๋ž˜ relation์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ํŒ๋‹จ์˜ negation
    • ๋งจ ์•ž์˜ normalization term: ๋ชจ๋“  ์Œ๋ณ„ ์ˆœ์—ด์— ๋Œ€ํ•ด ํ‰๊ท  ๋ƒ„

Evaluating Logical Consistency of LLMs

์„ธ๊ฐ€์ง€ ํƒœ์Šคํฌ์—์„œ LLM์˜ ํŒ๋‹จ ์ผ๊ด€์„ฑ ํ‰๊ฐ€

Evaluation Setup

  • Dataset
    • abstractive summarization evaluation (SummEval): ์š”์•ฝ๋ฌธ ๊ฐ„ ์„ ํ˜ธ๋„ ํŒ๋‹จ
    • document reranking (NovelEval): ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์‘๋‹ต์œผ๋กœ ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์˜ ๊ด€๋ จ์„ฑ ํŒ๋‹จ
    • temporal event ordering (CaTeRS): ์ด๋ฒคํŠธ ๊ฐ„ ์‹œ๊ฐ„์ , ์ธ๊ณผ์  ๊ด€๊ณ„ ํŒ๋‹จ
  • Metric
    • logical consistency metric: ์ธ์Šคํ„ด์Šค ์ˆ˜์ค€์—์„œ ๊ณ„์‚ฐํ•˜์—ฌ ํ…Œ์ŠคํŠธ์…‹์˜ ํ‰๊ท ๊ฐ’ ๊ธฐ๋ก
    • human aggrement rate (H.): LLM ํŒ๋‹จ๊ณผ ์ธ๊ฐ„ ์ฃผ์„ ๊ฐ„ ์Œ๋ณ„ ํŒ๋‹จ ์ •ํ™•๋„
    • self-agreement: ์—ฌ๋Ÿฌ ์ƒ˜ํ”Œ์— ๊ฑธ์ณ ๋‹ค์ˆ˜ ํŒ๋‹จ๊ณผ ์ผ์น˜ํ•˜๋Š” ์ถœ๋ ฅ์˜ ๋น„์œจ (0.5~1 ๊ฐ’ ๊ฐ€์ง)

Results and Analysis

  • Gemma2 9B, Phi3 medium ๊ฐ™์€ ์ตœ์‹  LLM์ด ์ด์ „ ๋ชจ๋ธ์— ๋น„ํ•ด ๊ฐ•ํ•œ ์ผ๊ด€์„ฑ ๋ณด์ž„
  • ํ•œ ์ธก๋ฉด์—์„œ์˜ ๊ฐ•ํ•œ ์ผ๊ด€์„ฑ์ด ๋‹ค๋ฅธ ์ธก๋ฉด์—์„œ๋„ ๊ฐ•ํ•œ ์ผ๊ด€์„ฑ ๋ณด์žฅํ•˜์ง€ ์•Š์Œ
    • Mistral 7B: transitivity์— ๊ฐ•ํ•˜์ง€๋งŒ ๋‹ค๋ฅธ ์ผ๊ด€์„ฑ ์ธก๋ฉด์—์„œ ์•ฝํ•จ
  • CoT ์ถ”๋ก ์ด ์ผ๊ด€์„ฑ ๊ฐœ์„ ํ•˜์ง€ ๋ชปํ•˜๋ฉฐ ์ผ๋ถ€ ๊ฒฝ์šฐ transitivity ํ•˜๋ฝ์‹œํ‚ด
    • ์ถ”๊ฐ€์ ์ธ CoT ํ† ํฐ ๋„์ž…์ด ํŒ๋‹จ ๊ธฐ์ค€์— ํ˜ผ๋™ ์ฃผ๊ธฐ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์Œ

Consistency and Reliability

  • ์„ธ ๋ฐ์ดํ„ฐ์…‹ ๋ชจ๋‘์—์„œ transitivity์™€ self-agreement ๊ฐ„ ๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„ ์กด์žฌ
    • transitivity๊ฐ€ LLM์˜ ๊ฒฌ๊ณ ์„ฑ ํ‰๊ฐ€ํ•˜๋Š” ์œ ์šฉํ•œ ๋Œ€๋ฆฌ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Œ
  • commutativity์™€ human preference์˜ ๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„ ์กด์žฌ
    ๊ฐ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด gpt 4 turbo๋กœ ๊ฐ์ƒ‰ํ•œ 10๊ฐœ ๋น„๊ต ํ”„๋กฌํ”„ํŠธ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ์ง€ํ‘œ ์‚ฐ์ถœ
    • commutativity๋Š” position bias์™€ ๊ด€๋ จ๋˜๊ณ  ์œ„์น˜ ํŽธํ–ฅ์€ ์ •๋ ฌ์— ์ƒ๋‹นํ•œ ์˜ํ–ฅ ๋ฏธ์น˜๊ธฐ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์Œ (๊ธฐ์กด ์—ฐ๊ตฌ ๊ฒฐ๊ณผ)

Improve Logical Preference Consistency in LLMs via REPAIR

  • REPAIR (Ranking Estimation and Preference Augmentation through Information Refinement): LLM ๋น„์ผ๊ด€์„ฑ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ
    1. ๋…ธ์ด์ฆˆ ์กด์žฌํ•˜๋Š” preference data์—์„œ ranking ์ถ”์ •
    1. ์ถ”๊ฐ€์ ์ธ conflict-free pairwise comparison ์ƒ์„ฑ
    • โ‡’ human preference์™€ alignment ์œ ์ง€ํ•˜๋ฉด์„œ logical preference coherence ๊ฐ•ํ™”
    • โ†’ logical operator๋กœ์„œ LLM์˜ ์‹ ๋ขฐ์„ฑ ๊ฐ•ํ™”

Estimating Rankings from Noisy Pairwise Data

  • noisy pairwise annotation์—์„œ ranking ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•: win-loss rate ์‚ฌ์šฉ
    • item์˜ win-loss rate:
      #winsโˆ’#losses#totalย comparisons\frac{\#\text{wins} - \#\text{losses}}{\#\text{total comparisons}}
    1. item์„ win-loss rate ๊ฐ’์œผ๋กœ ์ •๋ ฌ
    1. ์ •๋ ฌ ๊ฒฐ๊ณผ๋กœ self-consistent pairwise comparison set ๊ตฌ์„ฑ
    • ์ด์— negated relation๊ณผ์˜ ๋น„๊ต ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ฆ๊ฐ•ํ•  ์ˆ˜ ์žˆ์Œ

Experiments

  • Experimental Setup
    • dataset: Summarize From Feedback (๋‘ ์š”์•ฝ๋ฌธ ๊ฐ„ ํ’ˆ์งˆ ๋น„๊ต ์ฃผ์„ ์žˆ์Œ)
      • โ†’ ์ œ์•ˆํ•œ ๋ฐ์ดํ„ฐ ์ •์ œ ๋ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•์œผ๋กœ ์Œ๋ณ„ ๋น„๊ต์˜ ์ผ๊ด€์„ฑ๊ณผ ์–‘ ๊ฐœ์„ ํ•˜์—ฌ ์‹คํ—˜
    • llama 3 8B instruct ๋ชจ๋ธ์˜ instruction-tuning data ์˜ต์…˜
      • flipped ํ˜น์€ ๋ณ€ํ˜•๋œ ๋ฐ์ดํ„ฐ
      • ์ •์ œ๋˜๊ณ  ์ฆ๊ฐ•๋œ ๋ฐ์ดํ„ฐ (REPAIR-ed)
      • REPAIR-ed์— negated relation comparison ์ถ”๊ฐ€
  • Results and Findings
    • REPAIR-ed ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จํ•œ ๊ฒฝ์šฐ transitivity, commutativity ํ–ฅ์ƒ๊ณผ ํ•จ๊ป˜ human preference alignment ๊ฐœ์„  ๋ณด์ž„
    • negated invariance ํ–ฅ์ƒ์€ negated relation ํ›ˆ๋ จ์œผ๋กœ๋งŒ ๊ฐ€๋Šฅ

Impact of Logical Preference Consistency on Downstream Applications

logically grounded task์— LLM์˜ ๋…ผ๋ฆฌ์  ์„ ํ˜ธ๋„ ์ผ๊ด€์„ฑ์ด ๋ฏธ์น˜๋Š” ์˜ํ–ฅ ํ™•์ธ
  • ์‹คํ—˜ ๋ฐฉ๋ฒ•: LLM-as-judge ์•Œ๊ณ ๋ฆฌ์ฆ˜ (PairS: ์ด์ „ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ) ์‚ฌ์šฉ
    • ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ฑ๋Šฅ์€ LLM ์ƒ์„ฑ ์ˆœ์œ„์™€ ์ธ๊ฐ„ ํŒ๋‹จ์˜ ์ƒ๊ด€๊ด€๊ณ„๋กœ ์ธก์ •
    • ๋…ผ๋ฆฌ์  ํŠน์„ฑ์— ํฌ๊ฒŒ ์˜์กดํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž„
  • ์‹คํ—˜ ๊ฒฐ๊ณผ
    • Phi 3 mini๋Š” GPT 3.5 turbo์— ๋น„ํ•ด ์ธ๊ฐ„ ํŒ๋‹จ ์ •ํ™•๋„(H.)๋Š” ๋‚ฎ์ง€๋งŒ ๋” ๊ฐ•ํ•œ transitivity๋กœ ์šฐ์ˆ˜ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ฑ๋Šฅ ๋ณด์ž„
    • commutativity์™€ ๋ณด์ •์œผ๋กœ ์ธํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ฐ„์— ์ƒ๊ด€๊ด€๊ณ„ ์กด์žฌ
      • ๋ณด์ • ์ „์—๋„ commutativity ๋†’์€ llama 3 8B๊ฐ€ ์ข‹์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ฑ๋Šฅ ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐ ๋ณด์ • ๋œ ์š”๊ตฌ

Categories

research