Vote for Cryptopolitan on Binance Square Creator Awards 2024. Click here to support our content!

Websites Block Tech Giants from Using their Data to Train LLMs

In this post:

  • Websites are starting to block big tech from using their content to train AI, showing a shift in how the web operates. 
  • Google has launched a tool allowing sites to opt out, but it’s less popular than similar tools. 
  • The balance between protecting content and remaining visible in AI searches is a big challenge.

Recently there’s been a significant shift unfolding. Top websites are starting to guard their content against tech giants like Google and OpenAI. This step changes the longstanding relationship between web publishers and search engines. The shift is prompted by the rise of artificial intelligence (AI) technologies.

Websites protect their content

Traditionally, websites have used a simple yet powerful tool known as `robots.txt` to manage how search engines interact with their content. This arrangement allowed websites to benefit from the traffic directed by search engines. However, advanced AI models have introduced new complexities to this relationship. Companies such as OpenAI and Google have been using vast amounts of online content to train their AI systems. These AIs can now directly answer user queries, reducing the need for users to visit the original websites. They disrupt the flow of traffic from search engines to these sites.

In response, Google has introduced a new protocol called Google-Extended. It enables websites to block the use of their content for training AI models. The protocol was rolled out in September last year and it has seen adoption by around 10% of the top 1,000 websites. This includes high-profile names like The New York Times and CNN.

Read Also  Global Accord Unites China, US, and EU for AI Safety at Bletchley Summit

Comparing adoption and looking ahead

While Google-Extended represents a step toward giving websites control over their content, its adoption rate trails behind other tools such as OpenAI’s GPTBot. The hesitance may stem from worry over visibility in future AI-driven search results. Websites blocking access to their content risk being overlooked by AI models. They will potentially miss out on being included in answers to relevant queries.

The scenario with The New York Times is particularly telling. The publication has engaged in a copyright dispute with OpenAI. Since then, it has taken a firm stance by using Google-Extended to block AI model training access to its content.  

Google’s experimental Search Generative Experience (SGE) hints at a potential shift in how information is curated and presented to users. It highlights AI-generated content over traditional search methods. The decisions made by tech companies and web publishers will shape the digital ecosystem. It will influence how information is accessed and consumed in the AI age.

Land a High-Paying Web3 Job in 90 Days: The Ultimate Roadmap

Share link:

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Editor's choice

Loading Editor's Choice articles...

Stay on top of crypto news, get daily updates in your inbox

Most read

Loading Most Read articles...
Subscribe to CryptoPolitan