Google DeepMind CEO Demis Hassabis Questions OpenAI's Early Move into ChatGPT Ads
Demis Hassabis expresses surprise at OpenAI's decision to test ads in ChatGPT, highlighting concerns over user trust and the role of assistants in AI monetization.

Source URL: https://github.com/deepseek-ai/DeepSeek-V3 / https://arxiv.org/abs/2412.19437
The artificial intelligence landscape has just witnessed a seismic shift with the release of DeepSeek-V3, a groundbreaking open-source model that challenges the dominance of industry giants like OpenAI and Anthropic. In a field where "bigger is better" usually equates to "more expensive," DeepSeek-V3 has shattered conventional wisdom by achieving state-of-the-art performance comparable to GPT-4o and Claude 3.5 Sonnet—all while being trained on a remarkably modest budget of approximately $5.5 million.
This release is not merely another model drop; it is a technical and economic statement. By utilizing a highly optimized Mixture-of-Experts (MoE) architecture, DeepSeek-V3 demonstrates that smart engineering can rival brute-force compute. For developers and enterprises, this signals a potential end to the era of prohibitively expensive frontier models, democratizing access to top-tier intelligence.
At the core of DeepSeek-V3’s success is its sophisticated architecture, which balances massive parameter counts with extreme inference efficiency. While the model boasts a total of 671 billion parameters, it utilizes a sparse MoE design that activates only 37 billion parameters per token. This allows it to retain the vast knowledge base of a super-sized model while maintaining the speed and cost profile of a much smaller one.
One of the critical bottlenecks in serving Large Language Models (LLMs) is the Key-Value (KV) cache memory usage during inference. DeepSeek-V3 employs Multi-head Latent Attention (MLA), a novel mechanism that significantly compresses the KV cache. This innovation enables efficient processing of long contexts (up to 128k tokens) and allows for larger batch sizes during deployment, directly translating to lower inference costs.
Traditional MoE models often struggle with "expert collapse," where only a few experts are utilized, or require complex auxiliary losses to force load balancing, which can degrade performance. DeepSeek-V3 introduces an auxiliary-loss-free load balancing strategy. By dynamically adjusting bias terms during training, the model ensures that its 256 routed experts are utilized evenly without the performance penalties associated with traditional methods.
To understand the magnitude of this release, one must look at the numbers. DeepSeek-V3 does not just compete; it trades blows with the most powerful closed-source models currently available.
Key Performance Indicators:
Metric|DeepSeek-V3|GPT-4o|Claude 3.5 Sonnet
---|---|---
Architecture|Mixture-of-Experts (MoE)|Dense (Est.)|Dense/MoE Hybrid (Est.)
Total Parameters|671B|Unknown (1T+ Est.)|Unknown
Active Params/Token|37B|Unknown|Unknown
MMLU (Knowledge)|88.5|88.7|88.7
MMLU-Pro|75.9|72.6|76.1
HumanEval (Coding)|92.6%|90.2%|92.0%
MATH-500|90.2%|76.6%|71.1%
Training Cost|~$5.5 Million|~$100 Million+|Unknown (High)
Note: Benchmark scores are based on reported figures in the DeepSeek-V3 technical report and open leaderboards.
As illustrated above, DeepSeek-V3 outperforms or matches its competitors in critical domains such as coding (HumanEval) and mathematics (MATH-500), areas previously dominated by closed-source systems.
Perhaps the most shocking revelation from the DeepSeek-V3 technical report is its training efficiency. The model was trained on a cluster of 2,048 NVIDIA H800 GPUs over a period of just under two months. The total compute consumption was approximately 2.788 million GPU hours.
At a calculated rental price of roughly $2 per GPU hour for H800s, the total training cost comes in at roughly $5.576 million. In stark contrast, training a model of Llama 3.1 405B's caliber or GPT-4o is estimated to cost tens, if not hundreds, of millions of dollars.
The efficiency of the architecture trickles down directly to the end-user. DeepSeek has priced its API aggressively, undercutting major US-based providers by a significant margin.
API Pricing Comparison (Per Million Tokens):
| Model | Input Price | Output Price | Cache Hit Price |
|---|---|---|---|
| DeepSeek-V3 | $0.27 | $1.10 | $0.07 |
| GPT-4o | $2.50 | $10.00 | $1.25 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | $0.30 |
For developers building high-volume applications, DeepSeek-V3 offers a cost reduction of nearly 10x on output tokens and 9x on input tokens compared to GPT-4o. This pricing structure effectively commoditizes "frontier-level" intelligence, making advanced AI agents and data processing pipelines economically viable for startups and individual developers.
The release of DeepSeek-V3 forces a re-evaluation of the current AI competitive landscape.
For a long time, the "moat" protecting companies like OpenAI and Google was the sheer capital expenditure required to train state-of-the-art models. DeepSeek has demonstrated that algorithmic innovation (better architecture, better scheduling, FP8) can yield equivalent results at a fraction of the cost. If a $5.5 million model can rival a $100 million model, the barrier to entry for creating top-tier AI is rapidly crumbling.
While Llama 3.1 was a major milestone for open weights, DeepSeek-V3 pushes the envelope further by proving that open models can be efficient enough to run on more accessible hardware configurations (due to the 37B active parameter count) while delivering SOTA performance. This strengthens the open-source ecosystem, providing a viable alternative to closed-garden ecosystems.
DeepSeek-V3 is more than just a technological achievement; it is a market disruptor. By combining the sophisticated DeepSeekMoE architecture with FP8 training and MLA, the team has delivered a model that is high-performance, cost-effective, and open.
For the AI community, the message is clear: the future of AI development may not belong solely to those with the deepest pockets, but to those with the smartest engineering. As we move into 2025, the pressure is now on Western tech giants to justify their massive training budgets in the face of such efficient competition.
Creati.ai will continue to monitor the development of DeepSeek and the community's response to these new open weights.