Introduction
Training today’s frontier AI models requires massive centralized data centers packed with hundreds of thousands of GPUs, high-speed interconnects, industrial cooling systems, and 24/7 uptime. Only a handful of well-funded companies – think OpenAI, Google, Anthropic, xAI, and Meta – can afford this.
Naturally, I'm skeptical when I hear that people are training frontier models with a decentralized network of consumer-grade devices. Yet, this idea has been gaining steam within crypto circles, attracting millions in funding.
So, I dug deeper to separate hype from reality. Here’s my take on why decentralized training might matter and whether it can realistically compete.
Why decentralization matters
Top AI labs train models using huge clusters of GPUs housed in large data centers, allowing them to process vast amounts of data in parallel. They're now even connecting multiple nearby data centers together with high-speed networks to create even larger pools of compute.
Decentralized training offers a fundamentally different approach. Rather than relying on centralized, proprietary facilities, it allows anyone, anywhere to contribute hardware – whether gaming laptops, desktop GPUs, or idle server capacity – to a global computing network that trains models over the standard internet.
But why pursue decentralized training when centralized setups already deliver impressive results? I share the three main arguments I've come across below:
Accessibility
From data center buildouts to energy consumption, the infrastructure required for centralized training costs billions of dollars, putting frontier model creation out of reach for most teams. Even renting compute from cloud providers like CoreWeave requires deep pockets and multi-year commitments.
While decentralization doesn't reduce these costs, it changes how they're financed. Instead of one actor paying billions upfront, a network of contributors can supply compute and be compensated over time, whether through usage fees or a share of future model upside. Crypto has already proven effective at making this kind of coordination work at scale.
Censorship resistance
Open-weight models like Llama or Qwen exist today, but there’s no guarantee they'll remain open. Meta has said it won’t open-source all of its future models, and we've already seen other companies (like Elastic, Redis Labs, and HashiCorp) move from open-source to restrictive licenses, forcing forks and eroding trust. If your business relies on them, you're ultimately at the mercy of whoever funds and maintains the repo.
Decentralized training networks could offer a way to produce models that are more resilient by design. Since no single entity would control the entire computing stack, these models would be less vulnerable to sudden policy changes, geopolitical restrictions, or regulatory interference.
Scalability
Data centers can't scale indefinitely. Building data centers with millions of GPUs would require massive amounts of land, energy, and capital. At some point, you hit physical limits.
Decentralized networks could tap vast numbers of underutilized GPUs globally, effectively forming a massive virtual supercomputer. Though likely not cheaper or more performant than centralized setups, it might be the only way to coordinate distributed training runs across millions of GPUs at once.
Technical challenges
Despite its potential, decentralized training faces tough technical hurdles related to communication, coordination, and verification.
Communication
Decentralized training, in part, relies on globally distributed consumer-grade GPUs communicating over the standard internet. A typical US broadband connection offers ~60 Mbps upload (~0.007 GB/s). Compare that to modern data center GPUs communicating at up to 1,800 GB/s. That's more than a 250,000x difference.
Teams like Nous Research and Pluralis Research have made impressive strides, achieving significant bandwidth reductions without sacrificing model quality, thanks to clever compression techniques. However, the largest decentralized pre-training run to date – 15B parameters over 100B tokens – still pales in comparison to GPT-4’s rumored 1.7T parameters and 13T tokens.
So, while these teams have shown that decentralized training is feasible by reducing bandwidth requirements, they're yet to prove these compression methods can hold up at scale.
Coordination
While communication is the most pressing technical bottleneck, it’s only part of the challenge. Decentralized training networks also have to deal with unreliable hardware, nodes dropping in and out, and flaky internet connections.
Systems like Gensyn’s SkipPipe get around this by skipping slow or disconnected nodes, significantly reducing training time without compromising results. Pluralis also developed NAG Adaptation, which lets nodes keep training under delay while correcting stale updates, preserving stability and convergence even when some nodes lag or drop.
These approaches show that effective coordination in decentralized training is solvable, but not free, as it introduces redundant computation and coordination overhead from things like reordering, syncing, and retries – costs that will likely pile up at scale.
Verification
Open networks mean anyone, including malicious actors, can participate. Some might inject poisoned data; others will fake work to farm rewards. You need mechanisms to detect and punish bad behavior.
Prime Intellect, for example, addresses this by fingerprinting a model’s internal computations during inference so that verifiers can later spotcheck for cheating without re-running the full model. Gensyn, instead, uses an optimistic dispute system: jobs are run by at least two workers; if outputs diverge, a referee pinpoints the first divergent step and recomputes just that tiny operation to identify and penalize the dishonest party.
Like coordination, verification is solvable, but it comes with overhead related to redundant compute, additional latency, and protocol complexity.
While teams are making real progress on these technical hurdles, solutions to coordination and verification still introduce significant overhead, and it remains to be seen whether communication bottlenecks can be overcome at massive scale.
Cost
Even if these technical hurdles are overcome, cost competitiveness is another story.
Some suggest decentralization could eliminate data center overhead, especially cooling-related infrastructure (which makes up 30-40% of data center energy consumption). But data centers are optimized for efficiency – they use industrial cooling systems, optimized airflow, and tightly packed hardware layouts to minimize waste.
Consumer setups lack these efficiencies, resulting in higher costs. For example, GPUs overheating bedrooms would likely result in people cranking up the AC, which is wildly inefficient. This means that for every unit of electricity a GPU uses to train a model, another full unit might be burned just to keep the room cool.
Not to mention that consumer-grade GPUs are far less efficient than server-grade ones. They deliver 20-50% fewer useful computations per watt, meaning they get less training done per unit of electricity consumed. And, unlike data centers that buy power at industrial rates, home users pay twice as much for it.
So rather than driving costs down, decentralization could drive them up 3-5x, even before factoring in overhead from things like coordination and verification.
Demand
Decentralized training inherently faces higher latency and greater costs per unit of compute, likely leading to higher user fees and worse performance. That’s a tough sell when users can already access freely available, powerful open-source models like Llama, DeepSeek, or Qwen and run them cheaply with centralized hosting providers like Together AI. So the real question is: who would actually pay for decentralized models, and why?
One scenario could be that certain users might value decentralization enough to accept these trade-offs. For instance, decentralization offers censorship resistance, which could appeal to users in jurisdictions blocked by centralized APIs, or applications requiring uncensorable models and inference. That said, it’s unclear how large this market really is.
Another possibility is that decentralized training sparks innovation by putting frontier scale compute in the hands of talented people who otherwise couldn’t access it. Today, most researchers and startups can rent small clusters, but only a handful of well-funded labs have the compute resources to run massive training jobs. Broader access might lead to breakthroughs in optimization algorithms, compression methods, or novel architectures, resulting in models that are genuinely better or cheaper in the long run.
But compute alone isn't enough. To build useful, competitive models, you also need massive amounts of high-quality data – and data is increasingly becoming a top bottleneck:
So even if decentralized training makes frontier scale compute more accessible, it’s hard to see how these networks will consistently access the high-quality data needed to produce competitive models. Without solving this data problem, it's unlikely that broader compute access alone will necessarily lead to breakthrough models.
Monetization
Assuming decentralized training does produce valuable models, there’s still uncertainty around how compute providers would be compensated sustainably.
One proposed solution is Protocol Learning by Pluralis Research. The idea is to split model weights among workers to prevent any single entity from controlling the entire model. Inference revenue would then be split among workers proportional to their training input, as each inference task would require access to their specific model shards, allowing them to share upside in the models they help train.
However, Protocol Learning is still bottlenecked by communication, as splitting model weights across devices substantially increases communication overhead during training. And even if this technical hurdle is overcome, it's only sustainable if there’s meaningful demand for these models, which is not a given as we've already discussed.
Conclusion
Decentralized training is a genuinely interesting research direction, as I do think centralized training will likely hit scalability limits at some point, and decentralization could become the only feasible way to scale compute further. I also wouldn't want to live in a world where intelligence is owned and gatekept by a handful of companies, so I see the value of building open, censorship-resistant models.
But for decentralized training to be seriously competitive, we’d need to see major breakthroughs, especially in communication and cost efficiency. Without these, it’s difficult to see who would pay a premium for decentralized models over cheaper and faster alternatives.