On AI Valuable Content | ./maxime.sh

Mark Zuckerberg talking about AI:

Look, we’re a big company. We pay for content when it’s valuable to people. We’re just not going to pay for content when it’s not valuable to people. I think that you’ll probably see a similar dynamic with AI, which my guess is that there are going to be certain partnerships that get made when content is really important and valuable. I’d guess that there are probably a lot of people who have a concern about the feel of it, like you’re saying. But then, when push comes to shove, if they demanded that we don’t use their content, then we just wouldn’t use their content. It’s not like that’s going to change the outcome of this stuff that much.

via

The tip of the iceberg when it comes to AI data training involves bots scraping websites constantly and editors working to prevent that.

Ultimately, as Mark Zuckerberg suggests, the primary data consumed by LLMs consists mainly of books (although some datasets may contain illegal copies), news articles, and high-quality training material. Whether or not your blog is included in their dataset is not a major concern for them.

Although many individuals still interact with ChatGPT as if it were a Google chat bot, it is important to note that their reasoning abilities do not solely derive from blogs and smaller media sources, regardless of their quality.