Skip to content Skip to sidebar Skip to footer

The rise of “pay for scraping”: the new battle between AI platforms and companies

Imagine your blog or online community is full of valuable content, and an AI uses it without permission to answer questions… without giving you credit or paying you. That’s what Reddit denounced in 2025, suing the startup Perplexity for scraping its content without a license. This is not an isolated case: more and more platforms are demanding compensation for the use of their data.

This phenomenon is known as “pay for scraping” and is changing the rules of the game: if your AI model uses my data, then we negotiate or you pay. Let’s see why this marks a before-and-after in the relationship between content creators and artificial intelligence. Sounds good, right?

What is "pay for scraping"?

For years, many AI companies have trained their models on publicly available Internet content without compensating the people who generated it. It was like going into an open forest and picking fruit for free. But now, the owners of the forest platforms, media, and communities are asking for something in return.

This new approach seeks to establish licenses, fees, or limits on the automated use of data. The goal: to protect the value of human content and allow its creators to share in the benefits generated by AI.

Emblematic cases

As the value of data in training AI models grows, so do conflicts over its unauthorized use. Some platforms have decided to take strong action, setting precedents that could forever change how online content is accessed.

  • Reddit vs. Perplexity: Reddit accused the startup of mining its data at scale despite technical roadblocks. It had previously closed million-dollar deals with Google and OpenAI to use its API.
  • Stack Overflow also opted to monetize its knowledge archive and signed a partnership with Google to have its content appear in Gemini, the company’s programming assistant.
  • Media such as The New York Times have sued OpenAI and Microsoft for copyright infringement, and others have banned bots like GPTBot or charged for access.
  • Images and code are also in dispute: Getty Images sued Stability AI for using protected photos and several developers question whether GitHub Copilot respects open-source licenses.
Cnbc Reddit against Perplexity
Source CNBC

The reverse case: what if scraping benefits you?

Not all scraping is a threat. In some cases, companies create content with the thought of being found by generative AI. This practice is known as GEO (Generative Engine Optimization): optimizing your content so it appears quoted or summarized in systems like ChatGPT or Gemini.

Thus, allowing access to certain AI can provide visibility, authority or even traffic. Some brands are already structuring their content to be easily interpretable by language models, seeking to turn AI into an ally, not an enemy.

GEO vs SEO el Blog de Salvador Vilalta

Consequences and future scenarios

The battle for data is not just about ownership: it also redefines the balance of power in the digital age. Below, we explore the main impacts of this paradigm shift for AI companies, creators and users alike.

New barriers to entry in AI: where before a startup could train models with public data, now it must pay or negotiate. This favors large players and makes it harder for new entrants.

2. Changes in the web ecosystem: the traditional relationship between platforms, search engines, and creators is being reconfigured. If an AI responds without sending traffic, the “I give you content, you give me visibility” agreement is broken.

3. Legal and ethical debates: Is it fair to use human content to train AI without permission? How far does “fair use” go? Initiatives such as the Really Simple Licensing (RSL) standard Really Simple Licensing (RSL) seek to establish clear rules for these uses.

Copyright El Blog de Salvador Vilalta

Pay-for-scraping could mark the end of free, no-cost access to the Internet’s most valuable data, as we have seen in previous cases where media are already drawing red lines. But it also opens the door to a new balance, where human content is valued and AIs are fed data with consent.

The key will be to reach fair agreements: that AIs continue to learn from collective knowledge, but without erasing or ignoring those who create it.

What do you think? Should AIs pay for the data they use? Is scraping an opportunity or a threat to creators?

Leave me your comments and suggestions on this topic or future articles on AI and digital marketing, I’d love to read you 😉

Have a good week!

Did you like this content?

If you liked this content and want access to exclusive content for subscribers, subscribe now. Thank you in advance for your trust

Leave a comment

0.0/5

Go to Top
Suscribe to my Blog

Be the first to receive my contents

Descárgate El Método 7

El Método 7 puede será tu mejor aliado para incrementar tus ventas