Memorization and generalization
Memorization and generalization
◀ Prev | 2026-01-19, access: Free account (logged in)
Video memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter.
Click here to log in to your account, or here to sign up for a free account.