LinkedIn Secretly Training Its AIs on User Data

No matter how morally or even legally murky AI data scraping is, companies just can’t seem to stop. The latest culprit is the job-finding social media platform LinkedIn.

BY KIT EATON @KITEATON

SEP 19, 2024
linkedin-ai-inc

Photos: Getty Images

LinkedIn is trying all sorts of tricks to remain relevant and competitive with the cool kids’ social media apps, like Instagram or TikTok, including adding games to its otherwise slightly staid work-related social experience. It’s even embraced the snazziest tech-du-jour– AI — offering it as a tool to help users to brush up their resumes.

It also looks like LinkedIn is also copying its more fun-centric social media peers in another, much less amusing way. The site has quietly started scraping many of its users’ data to train AI systems. Worse still, the system is opt-out, not opt-in, forcing users to take steps to protect their own data, and the site doesn’t seem to have told anyone it was doing this. Is this an abuse of user trust? Can the site’s AI be trusted not to “leak” user personal info? 

According to technology site 404 Media, LinkedIn users on Wednesday noticed a setting that “showed LinkedIn was using user data to improve its generative AI.” News site TheStack, meanwhile, explains that LinkedIn’s user terms and conditions were actually “quietly” updated about a week ago, with language that gave its AI data grabbing plans away. “Where LinkedIn trains generative AI models, we seek to minimize personal data in the data sets used to train the models, including by using privacy-enhancing technologies to redact or remove personal data from the training dataset,” the new conditions note. And while the privacy-centric bit is reassuring, the note means LinkedIn is absolutely using its user data to train its AIs. 

Another piece of text in the update that may have data privacy advocates squirming explains the “artificial intelligence models that LinkedIn uses to power generative AI features may be trained by LinkedIn or another provider.” It adds that some of the models LinkedIn offers are provided by Microsoft’s Azure OpenAI service.

This last bit makes sense, since Microsoft owns LinkedIn…but it also implies that third- party AI systems may be accessing users’ LinkedIn data. LinkedIn tried to explain its actions to TheStack, with a spokesperson noting that the company was “making changes that give people using LinkedIn even more choice and control when it comes to how we use data to train our generative AI technology,” and “People can choose to opt-out, but they come to LinkedIn to be found for jobs and networking and generative AI is part of how we are helping professionals with that change.”

That’s quite an assertive position. Apparently LinkedIn has simply decided its users’ data is fair game for AI training–possibly taking a leaf out of the book of Microsoft’s AI chief Mustafa Suleyman, who recently argued that he thinks most data on the open web can be considered “fair game” for use as AI training material.

Curiously, TheStack points out that LinkedIn isn’t scraping every user’s data, and anyone who lives in the European Union, the wider European Economic Area or Switzerland is exempt. Though LinkedIn hasn’t explained why, it may well have to do with the zone’s newly passed AI Act as well as its long-held strict stance on user data privacy. As much as anything else, the fact that LinkedIn isn’t scraping EU citizens’ data shows that someone at a leadership level is aware that this sort of bold AI data grab is morally murky, and technically illegal in some places.

But why should LinkedIn users be concerned by this? After all, the company already has lots of personal data on file from its 800 million individual customers, along with detailed information on all of the companies that have set up a homepage there. 

First, it’s a question of data privacy. If LinkedIn made this change without asking users up front, switched on by default, then can it be trusted to not share user data with other sites? This sort of controversy has, in one way or another, beset other AI companies which have grabbed user data recently, including Meta, Google, OpenAI, Apple, Nvidia and Anthropic.

Secondly, generative AI systems are known to leak information, meaning that while LinkedIn promises to redact identifying personal information from the AI dataset, any slip ups in that process means you or your company’s data could pop up at a later date when a different person uses the AI system. If that ever happens, LinkedIn would be due for a new, catchy nickname: LeakedOut.
 

Inc Logo
Top Tech

Weekly roundup of the latest in tech news