Scale42 Thought Piece - DeepSeek

Jamie Stewart
Jan 29
7 min read

Updated: May 1

By Will Tasney, Thomas Beaton, Jamie Stewart

DeepSeek Disruption - Dispersion of AI Computing Demand

Introduction

DeepSeek’s innovative approach to AI training has disrupted the market, however, the stated efficiency and cost claims have likely been overstated for two reasons. Firstly to avoid suspicion on breaking US Semi-conductor sanctions and secondly, to cause maximum damage when launched as a geopolitical grenade to challenge US AI leadership. We examine the technology, its implications and winners and losers and conclude that it aligns well with our long-term outlook for the AI sector set out in our pitch-deck.

Summary

Efficiency and cost gains made by DeepSeek have likely been overstated
The achievement by DeepSeek and resulting gain in efficiency was through using PTX as opposed to NVIDIA’s CUDA as the programming language
- This is not practical for most AI companies to achieve.
Some reports indicate that DeepSeek have a lot more GPU resources than they have indicated, up to 60,000 GPUs as opposed to the 2,048 GPUs stated by DeepSeek.
This model may have two years to develop sacrificing speed for lower cost.
More open-source AI IP piece
Scale42 believes that more competition and open-source AI software will accelerate the requirement for mid-size AI training facilities that we specialise in.

What is DeepSeek?

DeepSeek is an open source AI Mixture-of-Experts (MoE) language model, with 671 billion parameters stated to have used a specialized cluster of 2,048 Nvidia H800 GPUs (an H100 GPU variant developed by NVIDIA to not violate US sanctions).

It is claimed that in just two months with a small GPU stack, they trained an AI that is comparable or better than similar models developed by industry leaders such as OpenAI and Meta.

It is estimated that the computing intensity needed to reach a product of this level was 10x higher than stated, making DeepSeek 10x more efficient than competitors.

DeepSeek has innovated by using PTX (Parallel Thread Execution) programming instead of Nvidia’s standard CUDA language used almost exclusively by the vast majority of the AI industry.

This is significant because NVIDIA and competing hardware providers (such as AMD) have produced GPUs with similar hardware specifications. Despite this NVIDIA has accumulated a majority market share for AI GPU sales, principally achieved due to NVIDIA’s proprietary CUDA programming language which has become the industry standard for AI developers.

Due to NVIDIA’s CUDA becoming the industry standard, AMD and others have built software bridges enabling their GPUs to be used for AI training with minimal amendments to CUDA programming. The market has to date rejected this and chosen to continue purchasing and using NVIDIA hardware despite significantly higher costs of doing so. DeepSeek have entirely abandoned the use of CUDA and have built their AI tool using PTX which is a feat few others possess the technical capabilities to do so.

DeepSeek is a spinout from a Chinese quantitative hedge fund ‘High-Flyer’ which in turn is reported to have purchased an estimated 10,000 - 60,000 NVIDIA GPUs. Whilst the initial investment into the chips was made in order to build AI trading programs, the limited success of the resulting trading-algos meant they turned their attention to developing AI tools that they have released as open-source software, i.e. with no intellectual property rights attached. The breakthrough in programming optimization has helped DeepSeek outperform conventional methods. It seems that US sanctions which have made the latest GPUs a scarce resource in China may have fostered a level of discipline and resource efficiency in programming akin to that of the late USSR, where resource-constrained computer scientists became some of the best programmers of their generation globally.

DeepSeek’s atomic impact

The low cost, short time and minimal resources stated as being required to train DeepSeek has underpinned the market’s reaction. Stated as costing just $6m to build something competitors have sunk billions into, and then releasing the IP for free, it undermined the sunk investments and balance sheets of IP holders like OpenAI. By doing more with less DeepSeek also undermined hardware providers such as Nvidia and US electricity companies banking on AI data centres being located next to their sites.

However, the true cost of DeepSeek is not known and the $6m figure could be an exercise in creative accounting. As a Chinese company DeepSeek has not legally been permitted to acquire advanced NVIDIA chips since October 2022. DeepSeek’s stated stock of 10,000 A100s may be under-reporting, with media sources reporting that they could actually have 50,000 or more. The under-reporting of resources, and corresponding over-reporting of performance may in part be a way of avoiding the long arm of US sanctions.

Even taking DeepSeek at face value, 10,000 Nvidia A100/H800s costs an estimated $300m, 50,000 though would cost $1.5bn. Other costs associated with LLM training include energy, operations, AI engineers and Data. Across all of these China comes in cheaper than its US peers but not sufficient so to evidence the $6m figure.

DeepSeek’s announcement and release just days after the US announced its $500bn Stargate programme has led some to believe that it was co-opted by the CCP to undermine US confidence in the AI race. This seems highly likely to be true, however, further fears on its potential to be spyware seem unlikely given its open-source nature.

Blast radius casualty analysis…

Ultimately the biggest losers here are those companies that have pursued a model that seeks to rent their AI models. At the front of this is Open AI’ Chat GPT, Meta’s Llama, Google’s Gemini, Anthopic/Amazon’s Claude among others. For these players they would likely share the sentiments of Steve Ballmer’s feelings in 2001 towards Linux, describing it as being ‘the cancer of open source’, but that parallel analogy did eventually see Microsoft win out.

If AI IP was in the DeepSeek blast zone, it was chipmakers that found themselves nearby suffering from significant radiation with shares down over 10%.

The impact of cheaper AI tools should mean more adoption as end user demand goes up. We agree with Stacy Ragson, semiconductor analyst at Bernstein who has followed the logic through brilliantly here:

If we acknowledge that DeepSeek may have reduced costs of achieving equivalent model performance by, say, 10x, we also note that current model cost trajectories are increasing by about that much every year anyway (the infamous “scaling laws . . .”) which can’t continue forever. In that context, we NEED innovations like this . . . as semi analysts we are firm believers in the Jevons paradox (i.e. that efficiency gains generate a net increase in demand), and believe any new compute capacity unlocked is . . . likely to get absorbed due to usage and demand.

This assessment correctly identifies that financial markets are continuously pricing in exponential improvements that are expected in the sector, small changes to this are hard to price in. Moore's Law, Jevons paradox, all have been true for some time and improvements ultimately lead to better performance and more adoption over time.

Continuing to explore outwards from the impact zone, we find electricity companies. For this we hand over to the FT’s Robert Armstrong:

Consider yesterday’s vicious sell-off in electricity companies near data centre hotspots. You might read that as the market saying demand for AI will not be very price elastic: data centres won’t need as much power because demand for AI services won’t surge as it gets cheaper. But that’s not necessarily the message. Maybe the market is saying the electricity demand won’t be where we thought, that is, near a few huge proprietary data centres run by giant US tech companies. Instead, AI will be run on smaller data centres all over the place. It is interesting, in this context, that while Nvidia recovered sharply yesterday, the hardest hit utilities (Constellation, Vistra and NRG) did not.

Figure 1. Share prices rebased

We agree with this view. Scale42’s own experience has been that capital market participants have been searching for ever greater MW sites for placing data centres set to consume ever greater amounts of energy for large singular computing clusters. Lets not forget that the definition of a Hyper-Scale data centre is set at the now lowly bar of 10MW and that our team’s previous HPC DC Hydrokraft at 30MW was at the time Europe’s most powerful.

Deepseek has put a question mark over the necessity of sites requiring hundreds of MW that have diseconomies of scale. Whilst these will still have a place and represent a brute force approach to staying at the forefront of the AI race, the ability of smaller training clusters to compete is a positive development for Scale42.

Our sites are located first and foremost for access to cheap, clean energy and will be brought to market to meet real world demand.

DeepSeek Supports Scale42’s Long Term View on Incumbent Disruption

In Scale42’s pitch, we set out 5 areas where we predicted industry disruption. DeepSeek fits in to several of these themes:

Nvidia hardware -> Competing hardware providers. Whilst DeepSeek was trained on A100s/H800s, it did not use Nvidia’s standard programming language CUDA. It is this language that has become synonymous with AI in general that has underlined chip buying decisions by AI developers. There are potentially better performing chips if programmers retrain on how to develop AI tools without the use of CUDA.
Generic LLM & AI Tools -> Optimised Enterprise Specific AI Tools. We see DeepSeek as a catalyst for open source based tools to be built without rent-seeking from the established leaders.
Few Cutting Edge Providers -> Proliferation of Bespoke Providers. DeepSeek is an example of the market widening, we expect to see an explosion of new competitors building on its foundations.

We are glad not to have to make any edits to this presentation slide following the release of DeepSeek! And happy to reiterate here its conclusions:

AI will entwine every level of the global economy as enterprises use their data to build custom AI tools
Enterprise AI will require a deeper market for mid-sized AI infrastructure
Technology less centralised on NVIDIA
More open-source tools lower the cost of AI tech and increase end demand for infrastructure
Scale42’s approach to next phase of AI innovation:
- Mid-scale training assets
- Cost leader
- Chip-agnostic

Scale42 Thought Piece - DeepSeek

DeepSeek Disruption - Dispersion of AI Computing Demand

Recent Posts

Comments