chess-and-rl

Posted Feb 17, 2026

By Erik Makela

1 min read

Response to Chess engines do weird stuff (has quite a wild home page layout) AlphaZero, RL, and SPSA, Cosmo Bobak

I argue forcefully against the notion that the self-play loop is in some sense “not necessary”, or even “only necessary one time”. Distillation from a fixed oracle has a ceiling: the student can approach, but never exceed, the quality of the teacher’s data. To surpass that ceiling, you must search-amplify the new network, generating better data than the old oracle could, and distill again — and this is precisely the self-play loop. The distance from random play to superhuman play is not crossed in one leap.

feed

quote

This post is licensed under CC BY 4.0 by the author.

Trending Tags