Memorization and generalization

◀ Prev | 2026-01-19, access: Free account (logged in)

Video memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter.

Click here to log in to your account, or here to sign up for a free account.