Blog

21 January 2026

LLMs Encode Harmfulness and Refusal Separately

NIPS'25

๐Ÿ’กLLM์€ instruction์˜ ์œ ํ•ด์„ฑ๊ณผ ๊ฑฐ๋ถ€ ์—ฌ๋ถ€๋ฅผ ๋‹ค๋ฅธ latent space์—์„œ ์ธ์ฝ”๋”ฉํ•˜๊ณ  ์žˆ๋‹ค!

์ตœ๋ฏผ์˜
21 January 2026

From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

ACL'25

๐Ÿ’ก๋‘ ๊ฐ€์ง€ ๊ธฐ์ค€์˜ ์—”ํŠธ๋กœํ”ผ ๊ฐ’์— ๋”ฐ๋ผ logits ๊ธฐ๋ฐ˜๊ณผ sampling ๊ธฐ๋ฐ˜ ์›Œํ„ฐ๋งˆํ‚น์„ ์„ ํƒ์ ์œผ๋กœ ์ ์šฉํ•˜๋Š” Symbiotic Watermarking ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆ

21 January 2026

Curriculum Debiasing: Toward Robust Parameter-Efficient Fine-Tuning Against Dataset Biases

ACL'25

๐Ÿ’กPEFT๋กœ ํ•™์Šตํ•  ๋•Œ biased example์— overfitting๋˜๋Š” ๊ฒฝํ–ฅ ์กด์žฌํ•จ (biased example์— ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๊ธฐ ๋•Œ๋ฌธ) โ‡’ ํ•™์Šต ๋ฐ์ดํ„ฐ ์ˆœ์„œ๋ฅผ biased-to-unbiased ๋กœ ์ œ์‹œํ•ด์„œ, ์ด๋ฅผ ์™„ํ™”ํ•˜์ž!