Elon Musk did not create an AI trained on your fanfiction.
Hi, AI ethicist + fanfiction expert here. (This is one of those times where I feel uniquely qualified to comment on somethingā¦)
Iām seeing this weird game of telephone about the Sudowrite AI that I think started out pretty accurate, but now has becomeĀ āElon Musk created an AI that is stealing your fanfictionā (which frankly gives him far too much credit). I can probably say more about this, but here are a few things that I want to clarify for folks, which can be boiled down toĀ āElon Musk has nothing to do with thisā andĀ āthis is nothing newā: Elon Musk is not involved in any way with Sudowrite, as far as I can tell. Sudowrite does, however, use GPT-3, the widely-used large language model created by OpenAI, which Elon Musk co-founded. He resigned in 2018, citing a conflict of interest due to Teslaās AI development. It wasnāt until after he left that OpenAI went from being a non-profit to a capped for-profit. Elon Musk doesnāt have anything to do with OpenAI currently (and in fact just cut off their access to Twitter data), though I canāt find anything that confirms whether or not he might have shares in the company. I would also be shocked if Elon actually contributed anything but money to the development of GPT-3.
Based on Sudowriteās description on their FAQ, they are not collecting any training data themselves - theyāre just using GPT-3 paired with their own proprietary narrative model.Ā And GPT-3 is trained on datasets like common crawl and webtext, which can simplistically be described asĀ āscraping the whole internet.ā Same as their DALL-E art generator. So itās not surprising that AO3 would be in that dataset, along with everything else (e.g., Tumblr posts, blogs, news articles, all the words people write online) that doesnāt use technical means to prohibit scraping.Ā
OpenAI does make money now, including from companies like Sudowrite paying for access to GPT-3. And Sudowrite itself is a paid service. So yes, someone is profiting from its use (though OpenAI is capped at no more than 100% return on investment) and I think that the conversations about art (whether visual or text) being used to train these models without consent of the artist are important conversations to be having.
I think itās possible that what OpenAI is doing is legal (i.e., not copyright infringement) for some of the same reasons that fanfiction is legal (or perhaps more accurately, for reasons that many for-profit remixes are found to be fair use), but I think whether itās ETHICAL is a completely different question, and Iāve seen a huge amount of disagreement on this.
But the last thing I will say is that this is nothing new. GPT-3 has been around for years and itās not even the first OpenAI product to have used content scraped from the web.
sanity check
















