News + Trends

Google is changing its data protection - and you now support AI training

Luca Fontana
5.7.2023
Translation: machine translated

Google doesn't just want your personal data to train AI bots, it wants the entire public internet. To this end, Google has changed its privacy policy. This raises questions about copyright.

Google updated its privacy policy at the weekend. In it, the online giant reserves the right to use pretty much everything you've ever posted online to train its AI tools. At least as long as such data is publicly accessible. For example, comments in our comment column.

This approach is not entirely new. In its privacy policy, Google already spoke about collecting "information" that is accessible "online or from other public sources". What is new, however, is that this data is no longer only used to train language models such as Google Translate. It is now also explicitly used "for the development" and "training" of AI models such as Bard and other cloud AI functions.

In other words, Google sees the entire public internet as part of its own AI playground.

New privacy policy - is Google even allowed to do this?

The unusual thing about the new privacy policy is the scope of the data that Google can use. It's not just about data that you transmit to Google when you use its services as a logged-in user. Google is talking about "data from the entire public internet". Alarm bells are ringing among data protectionists.

From a legal perspective, however, the situation remains confusing. Google is not the only company using publicly accessible internet data to train its AI bots. Competitor Open AI is also unclear about who actually has the rights to publicly accessible data and whether it can be used to train Chat GPT. Currently, courts in California are arguing about this.

The private sector is putting up resistance

The private sector, however, does not seem willing to wait for clarity from the courts. Just under a month ago, the Community discussion platform Reddit excluded third-party providers from accessing the website unless they paid for it. This led to a wave of indignation among Reddit users. They accused the platform of greed and called for a boycott lasting several days. Colleague Florian reported on this:

  • News + Trends

    Reddit protest campaign: what's behind it

    by Florian Bodoky

The real reason for Reddit's drastic measures, however, may be their central role in training chatbots such as Bard and Chat GPT. These used the data records collected on Reddit without paying for them. So far, however, only the companies behind the chatbots, such as Google and Open AI, have benefited from this. No wonder Reddit now wants a slice of the cake too.

Twitter recently took the same line. Last weekend, Twitter CEO Elon Musk limited the number of tweets that users are allowed to view per day. According to Musk, this is to prevent the extremely high levels of "data skimming" and "system manipulation". In other words: bots that collect data to train AI bots such as Bard and Chat GPT. However, most IT experts doubt whether this is the real reason for limiting tweets. They see it more as a knee-jerk reaction to technical problems caused by Musk's mismanagement or incompetence - or both.

Cover photo: Luca Fontana

31 people like this article


User Avatar
User Avatar

I'm an outdoorsy guy and enjoy sports that push me to the limit – now that’s what I call comfort zone! But I'm also about curling up in an armchair with books about ugly intrigue and sinister kingkillers. Being an avid cinema-goer, I’ve been known to rave about film scores for hours on end. I’ve always wanted to say: «I am Groot.» 

8 comments

Avatar
later