We cannot have a safe model that doesnāt interact with reality and doesnāt get feedback, because it wonāt be coupled with reality. But we canāt trust a model that interacts with reality unless weāre already been convinced its representations are valid and not likely to slip. This is not to conclude that the problem is unsolvable - it is just saying that the ways that model developers are coupling models to reality seems to assume weāve solved the problem that weāre trying to fix by doing so.
Cooperative AI fails unless its words reliably point to the world. Unfortunately, it seems like the way models are built is assuming the pro














