Quis custodiet reward models

◀ Prev | 2025-09-29, access: Free account (logged in) | Next ▶

Video alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values.

Click here to log in to your account, or here to sign up for a free account.