possibly the most on-brand post I've ever written?
Okay, I'll be honest that I don't care about the object level issue here. But what is interesting is how important this is to the future.
The AI we get to play with now is, by the standard of most technological/commercial maturity, wildly open to embarrassing use. I don't need to relitigate the early profanely racist chatbots, the sexual and romantic uses people have made, the instructions for destructive weapons, and imitating real life people. Hell, I'm just amused everytime the AI says "fuck" on its own.
Now, one of the reasons for this level of independence is the philosophical leanings of the coders creating it. But I think a much bigger reason is that LLM's have been very hard to control. They may *rarely* go off script today saying incredibly disgusting things, but it's still possible because "never do this" is a design that goes against their architecture.
Given that the LLM is going to embarrass them anyway, putting much effort into fencing them into respectable behavior is kind of a waste.
But we are in the early days, and we've seen this show before.
Once AI is widely adopted in commercial and government frameworks, every incentive will work to muzzle them. The AI my multinational company paid billions for can not misgender me, or talk about the logical flaws in Christianity, or give sex and relationship advice. It's a mild scandal now, but will be much more of one once AI is no longer the wild west frontier.
And our technology to stop LLM's from going out of bounds will continue to improve.
Which will mean what? AI's that don't curse and have all the realness of a Human Resources mission statement. And I promise you, that will make it useless to 50% of the people who would want to use it, because that's what shackling an information/communication system does.
(See this post about CrazyMeds.us by the old Slatestarcodex.)
The question to me is what will happen after that? Will there be enough jailbroken AI's that anyone can access and get frank language from? Or will it be a tool that can never really work on a mass basis, because the only mass distributions are lobotomized?
















