Post

chess-and-rl

📬 Subscribe to my newsletter Subscribe

Response to Chess engines do weird stuff (has quite a wild home page layout) AlphaZero, RL, and SPSA, Cosmo Bobak

I argue forcefully against the notion that the self-play loop is in some sense “not necessary”, or even “only necessary one time”. Distillation from a fixed oracle has a ceiling: the student can approach, but never exceed, the quality of the teacher’s data. To surpass that ceiling, you must search-amplify the new network, generating better data than the old oracle could, and distill again — and this is precisely the self-play loop. The distance from random play to superhuman play is not crossed in one leap.

This post is licensed under CC BY 4.0 by the author.