SAFETY

26 March 2026

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

COLM'25

๐Ÿ’กRefusal token์œผ๋กœ ๋ชจ๋ธ์˜ ์‘๋‹ต ๊ฑฐ์ ˆ์„ ๋” ์„ฌ์„ธํ•˜๊ณ (์„ฑ๋Šฅโ†‘), ์œ ์—ฐํ•˜๊ฒŒ(inference ๋‹จ์—์„œ ์กฐ์ ˆ ๊ฐ€๋Šฅ) ํ•œ๋‹ค!