AI Crawlers Overload Wikimedia's Bandwidth, Threatening User Access

04/02/2025 Technology

Wikimedia is facing a significant challenge: a massive surge in bandwidth usage driven by AI crawlers. The Wikimedia Foundation reports a 50% increase since January 2024, not from human readers, but from automated programs scraping data for training generative AI models. This unexpected influx threatens the accessibility and performance of Wikimedia's resources for regular users.

The AI Traffic Surge

Unlike human readers who often access similar and trending topics, AI crawlers systematically scan a wide range of pages, including obscure ones. This puts a strain on Wikimedia's infrastructure as these less frequently accessed pages need to be served from the core data center, consuming more resources and increasing costs. Wikimedia reveals that a staggering 65% of its most resource-intensive traffic originates from these bots.

This bot-driven traffic can cause disruptions, potentially slowing down page load times, especially during high-traffic events. The foundation's Site Reliability team is constantly working to block these crawlers to maintain optimal performance for human users.

Attribution and Sustainability

Wikimedia emphasizes that the primary concern isn't just the bandwidth consumption, but the lack of proper attribution. As a non-profit relying on donations, Wikimedia needs to attract new users and maintain its community. The foundation stresses that while its content is free, the infrastructure required to deliver it is not.

Looking ahead, Wikimedia plans to establish sustainable access methods for developers and reusers. With AI-related traffic showing no signs of slowing down, finding a balance between open access and resource management is crucial for the foundation's future.

Source: Engadget