In the latest headline that underscores the mounting tension between tech giants and content creators, Apple has found itself at the center of controversy over the data used to train its artificial intelligence models. According to a recent report by Wired, several high-profile websites have chosen to exclude their data from Apple’s AI training processes, a decision that places further light on the ongoing struggle over digital content rights in the age of artificial intelligence.
The list of major websites opting out of Apple’s AI training data pool is a notable one. It includes prominent social media platforms like Facebook and Instagram, along with influential media outlets such as The New York Times, The Financial Times, and The Atlantic. Notably, the Condé Nast network, which includes Wired itself, is also part of this exclusionary list. Other sites like Craigslist, Tumblr, and Vox Media have also decided to withhold their content from Apple’s AI training regimen.
This decision reflects a broader industry trend where content creators and publishers are increasingly protective of their intellectual property. The rationale behind these exclusions often revolves around concerns over how publishers’ data is used and monetized by tech companies without compensation or recognition.
Apple’s approach stands in contrast to that of other AI developers. For instance, several of the sites that have opted out of Apple’s AI training have signed agreements with OpenAI. These agreements permit OpenAI to utilize their content for training its models. Such arrangements are part of a different strategy where some content providers are finding ways to collaborate with AI developers rather than completely rejecting their data usage.
But not all are jumping ship. The New York Times has taken a more adversarial stance. The publication is currently engaged in a lawsuit against OpenAI and its backer Microsoft (MSFT), accusing both of copyright infringement for allegedly using its content without permission. This legal battle is just part of the complex legal and ethical issues surrounding the use of digital content in training AI models.
Apple has stated that its AI, known as Apple Intelligence, utilizes data from its web crawler, Apple Bot, which focuses on “licensed” and publicly available data. Apple also offers web publishers the option to opt out of their content being used for AI training. In addition to Apple Bot, Apple employs a secondary crawler, Applebot-Extended, which provides publishers with further control over whether their content is used in training the company’s foundation models.
This approach by Apple seems to show a commitment to respecting the preferences of web publishers, but it also raises questions about the limitations of such opt-out mechanisms. While the option to exclude content is available, the effectiveness and transparency of this process can be subject to scrutiny.