What Innovations Has DeepSeek Introduced?

Advertisements

Artificial intelligence is currently at the brink of an innovation explosion, where continuous innovation is vital for sustainability at the center stage of its development.

DeepSeek surged in popularity just before the Spring Festival and has maintained its momentum since. This platform is paving the way for open-source models, exhibiting remarkable performance metrics while keeping training and operational costs extremely low. This development has ignited hope among AI practitioners who are now enthusiastic about applying AI across various sectors.

However, as exhilarating news abounds, there are also mixed narratives circulating—claims that DeepSeek has revolutionized the course of AI development, suggesting its capabilities surpass those of industry leaders like OpenAI, or conversely, that DeepSeek is merely an inflated reputation, essentially distilled from the work of OpenAI.

To discern the truth behind these assertions, I've scrutinized a wealth of resources and consulted with experts, leading to initial insight into what innovations DeepSeek has introduced and whether it can sustain this innovation.

The conclusion regarding DeepSeek's innovation is as follows: it employs a more efficient model architecture and training framework, representing significant engineering advancements but not a groundbreaking revolution. While it hasn't shifted the trajectory of the AI field, it has certainly accelerated development considerably.

To arrive at this conclusion, an understanding of the developmental trajectory of AI technology is essential.

A Brief History of AI

Artificial intelligence originated in the 1940s, evolving over nearly 80 years, with British computer scientist Alan Turing recognized as its foundational figure. The Turing Award, named in his honor, serves as the Nobel Prize of computer science.

Today, the dominant technology within the AI industry is large models, particularly in the realm of generative AI, which includes the generation of semantics, speech, images, and video. Models such as DeepSeek, OpenAI's GPT series, and others like Doubao, Kimi, Tongyi Qianwen, and Wenxin Yiyan are all part of this large model family.

The theoretical foundation of large models is neural networks, a concept aiming to mimic human brain functions. Although neural networks were conceptually introduced alongside AI, they did not gain widespread significance until the mid-1980s, thanks to improvements in multi-layer perceptron models and backpropagation algorithms. Notable contributors to this field include Geoffrey Hinton, who recently won a Nobel Prize in Physics, holding dual citizenship in the UK and Canada.

Neural networks later evolved into deep learning theories, with prominent figures like Hinton, Yann LeCun of France, and Jürgen Schmidhuber of Germany proposing or refining significant model architectures such as Deep Belief Networks (DBN, 2006), Convolutional Neural Networks (CNN, 1998), and Recurrent Neural Networks (RNN, 1997), allowing for deep learning based on multi-layer networks.

However, this was still an era of small models, as the parameter counts for DBN and RNN typically ranged from thousands to millions, while CNNs maxed out at a few hundred million. Consequently, they could only perform specialized tasks; for instance, Google’s AlphaGo, based on CNN architecture, could defeat top human Go players like Ke Jie and Lee Sedol, yet it was limited to just playing Go.

In 2014, Google’s DeepMind team introduced the concept of the “attention mechanism.” By the end of that year, Montreal University professor Yoshua Bengio and his doctoral students published a more comprehensive paper highlighting this significant advancement in neural network theory, greatly enhancing modeling capabilities and computing efficiency, enabling large-scale handling of complex tasks.

Yoshua Bengio, Yann LeCun, and Geoffrey Hinton collectively received the Turing Award in 2019.

In 2017, Google introduced the Transformer architecture entirely based on the attention mechanism, heralding the large model era. Currently, most mainstream large models, including DeepSeek, utilize this architecture. Reinforcement Learning (RL) and the Mixture of Experts (MOE) model also serve as crucial support for large models, with these theories introduced in the 1990s and later adopted by Google in product development during the 2010s.

It's worth clarifying a common misconception: MOE is not an alternative architecture to Transformer; rather, it's a method employed to optimize the Transformer architecture.

Today, mainstream large models boast parameter counts reaching trillions, with DeepSeek V3 featuring 671 billion. Such immense models demand extraordinary computational power, effectively positioning Nvidia's GPU chips as the backbone for this computational requirement. Nvidia’s monopolistic status in the AI chip market has not only propelled it to become one of the highest-valued companies globally but has also posed challenges for Chinese AI companies.

While Google has led the charge during the large model era, the darling of recent years is OpenAI, established in 2015. Their diverse array of large models has consistently set industry benchmarks, with a myriad of competitors seeking to emulate their success. This illustrates that even seemingly unassailable giants in the AI landscape are not devoid of challengers. Although AI concepts have been around for 80 years, real acceleration only commenced in the past decade, and the current explosion in AI only took off recently; thus, newcomers always have the opportunity to make their mark. DeepSeek, established in July 2023, and its parent company MiaoFang, founded in February 2016, are both younger than OpenAI. AI is a field driven by youthful innovation.

The ultimate goal within the AI industry is to develop a system of Artificial General Intelligence (AGI) capable of autonomous thought, learning, and problem-solving, embraced by both Ultraman and Liang Wenfeng. They have opted for large models, which is the main directional trend in the field.

Predicting how long it will take to achieve AGI along the large models trajectory varies; optimists suggest 3-5 years, while pessimists estimate 5-10 years. In essence, the consensus is that AGI could potentially be realized by 2035.

The competition surrounding large models is paramount, as they serve as the upstream basis for AI applications across sectors, functioning similarly to the human brain—where the cognitive capabilities define overall learning, work, and quality of life.

However, pursuing large models is not the only pathway to AGI. Analogous to how "deep learning-large models" oversaw a shift from the "rule-based-expert system" paradigm of the first decades of AI, it is conceivable that the "deep learning-large models" pathway could be upended by future innovations, though it remains unclear who these potential disruptors may be.

What Innovations Has DeepSeek Introduced?

Today, as DeepSeek positions itself as a challenger, has it truly overtaken OpenAI? Not quite. While DeepSeek has exceeded OpenAI’s capabilities in certain aspects, overall, OpenAI remains ahead.

To begin with, OpenAI's foundational model is the GPT4-o, set to be released in May 2024, whereas DeepSeek is scheduled to unveil V3 on December 26, 2024. The Stanford Center for Research on Foundation Models maintains a global ranking of foundational models, with the latest ranking announced on January 10 of this year across six metrics. In total, DeepSeek V3 scored 4.835, securing the top rank, whereas GPT4-o (from May) scored 4.567, placing sixth. The second to fifth positions are occupied by American models, with Claude 3.5 Sonnet as the second, scoring 4.819, created by Anthropic, established in February 2021.

Reasoning models represent a new direction of large models, as they mimic human thinking patterns, which is paramount for achieving the end goal of developing AGI—AI that can independently think, learn, and resolve issues.

On September 12, 2024, OpenAI launched the world’s first reasoning model, Orion1 (O1), which demonstrated astounding capabilities in addressing mathematical, programming, and scientific problems. However, OpenAI maintains a closed-source policy, not disclosing its technical principles or specifics. Consequently, replicating O1 became a worldwide chase among AI practitioners.

Just four months later, on January 20 of this year, DeepSeek released the world's second reasoning model, R1—a straightforward name derived from "Reasoning." Evaluation results indicated that DeepSeek-R1 is comparable to OpenAI-O1. However, OpenAI released an upgraded O3 model on December 20, 2024, significantly outperforming O1. As of now, there are no direct comparison evaluations between R1 and O3.

Multimodality is another key trend in large models, enabling the generation of semantics (with coding included), speech, images, and video, with video generation requiring the most computational resources. DeepSeek introduced its first multimodal model, Janus, in October 2024, followed by its upgraded version, Janus-Pro-7B, on January 28 of this year, performing exceptionally in image generation tests, though video generation capabilities are still uncertain. GPT-4 is a multimodal model but does not produce video; however, OpenAI has a dedicated video generation model, Sora.

Reducing model size while enhancing efficiency and minimizing computational resource consumption represents another critical trend in the industry. The design concept of a mixture of experts model aims to achieve this goal, with reasoning models equally minimizing the astonishing resource demands of general large models. In this aspect, DeepSeek clearly outperforms OpenAI, as recent discussions have highlighted that DeepSeek's model training costs are merely one-tenth of OpenAI's and operational costs only one-thirtieth. This remarkable cost-effectiveness stems from DeepSeek's innovative engineering not isolated to single points but rather dense innovations across every aspect.

Here are three notable examples:

★ Model Architecture: The greatly optimized combination of Transformer + MOE.

As discussed earlier, these technologies were pioneeringly developed and employed by Google; however, when DeepSeek designed its model using these techniques, it achieved substantial optimizations and introduced a novel multi-head latent attention mechanism (MLA) into the model, dramatically lowering power and memory resource consumption.

★ Model Training: FP8 Mixed Precision Training Framework.

Traditionally, large model training employed 32-bit floating-point format (FP32) for computations and storage, ensuring precision but at the cost of slower speeds and greater storage usage. Striking a balance between computational cost and precision has always been a challenge in the industry. In 2022, Nvidia, Arm, and Intel first proposed an 8-bit floating-point format (FP8), yet the resolution of the issue remained superficial due to the abundant power resources available to American companies. In contrast, DeepSeek constructed the FP8 mixed precision training framework, dynamically selecting either FP8 or FP32 based on various tasks and data characteristics, improving training speed by 50% while reducing memory occupancy by 40%.

★ Algorithm Development: New Reinforcement Learning Algorithm GRPO.

Reinforcement learning enables computers to autonomously learn and complete tasks without explicit human programming instructions—an essential method towards achieving AGI. Initially, reinforcement learning was spearheaded by Google, notably utilized in training AlphaGo; however, OpenAI later introduced significant algorithms TRPO (Trust Region Policy Optimization) and PPO (Proximal Policy Optimization) in 2015 and 2017, respectively. Subsequently, DeepSeek advanced this field further with the introduction of the new reinforcement learning algorithm GRPO (Group Relative Policy Optimization), which significantly lowered computational costs while enhancing model training efficiency.

Upon reaching this point, it’s clear that the assertion “DeepSeek merely distilled OpenAI’s models” is flawed. Still, can DeepSeek be regarded as disruptive innovation that originated something entirely novel?

Evidently, it isn’t. Disruptive innovation refers to breakthroughs that carve out new avenues or substantially alter existing ones. For example, the invention of the automobile disrupted the transportation industry while rendering the horse-drawn carriage obsolete; similarly, smartphones have replaced traditional mobile phones, modifying the trajectory of telecommunications.

Sacks emphasizes model performance, but the greater significance lies in the cost-performance ratio—training costs at 1/10 and usage costs at 1/30, which enables cutting-edge AI technology to become accessible to everyday consumers. Recently, industry leaders across sectors have eagerly integrated DeepSeek’s large models into their applications to embrace AI like never before.

Nonetheless, I must caution that advancements in large model technology progress swiftly; we must not hold an overly optimistic view on interim results. Since large models are positioned at the top of the AI ecosystem, the quality of foundational models ultimately dictates the effectiveness of AI applications across various fields.

Can DeepSeek Sustain Innovation?

Triggered by DeepSeek’s emergence, Sam Altman divulged OpenAI's development plans on February 13: GPT-4.5 is set to launch within weeks, followed by GPT-5 in a few months. GPT-5 will integrate the functionalities of the reasoning model O3, becoming a multimodal system incorporating semantics, speech, image visualization, search, and in-depth research capabilities.

Such as Altman suggests, users will no longer need to select from myriad models, as GPT-5 will accomplish all tasks, achieving what he terms “magically unified intelligence.”If realized, GPT-5 would be another step toward AGI.

From a user’s perspective, a single model addressing all needs would undoubtedly prove advantageous, much like how modern smartphones have supplanted multiple devices required for daily tasks. However, the resources required for such a comprehensive model are immense; the computational power of an iPhone 16 far surpasses that of earlier models. The miracle lies in the reduced cost of utilizing an iPhone 16 compared to the once-common Nokia 8210, raising hopes that a similar miracle could occur in the AI sector.

Moreover, numerous outstanding AI companies exist in the US alongside OpenAI, with only minor discrepancies in their capabilities. The Stanford ranking discussed earlier indicates a mere 0.335 difference between the first and tenth spot, which translates to an average inconsistency of less than 0.06 across metrics. While rankings serve as significant references, they do not strictly correlate with real capabilities. For DeepSeek, challenges arise not only from OpenAI but also from formidable competitors like Anthropic, Google, Meta, and xAI.

On February 18, xAI unveiled Grok-3, which Musk claims to be “the strongest AI on Earth.” This model was trained using over 100,000 H100 chips, pushing the scaling law—the more computational and data resources invested, the better the model performs—to its utmost. However, this also exposed the diminishing marginal returns of scaling law.

Additionally, China does not lack capable AI companies that challenge DeepSeek.

Even so, I remain confident in Liang Wenfeng and the DeepSeek team. His limited interviews reveal a character imbued with idealism yet grounded realism, showcasing a sharp business acumen. He likely understands technology but may not be a genius inventor himself; rather, he resembles a technology-oriented entrepreneur akin to Steve Jobs or Elon Musk, capable of gathering technological talent to produce remarkable products.

In an exclusive interview with “The Undercurrent,” Liang Wenfeng stated, “Our core technical roles primarily comprise recent graduates or those with a couple of years of experience. Our hiring standards have consistently centered on passion and curiosity. We ensure value alignment during recruitment, followed by a strong company culture to ensure cohesive progress.”

“It is crucial to partake in the global tide of innovation. For over thirty years, the IT wave has largely excluded us from genuine technological innovation. Most Chinese companies have become accustomed to following rather than innovating. The actual gap between China’s AI and that of the US lies in originality versus imitation. If this does not change, China will remain a follower.”

“Innovation is fundamentally a matter of belief. Why is Silicon Valley so rich in innovative spirit? Because they dare to tackle the hardest challenges. The most attractive thing for top-tier talent is to solve the world’s most formidable problems.”

Jobs famously stated: only the crazy enough to believe they can change the world will do so. I see reflections of this statement in Liang Wenfeng.

Nevertheless, we must remain wary of overly optimistic assumptions regarding China's capacity to surpass the US in AI. DeepSeek has not disrupted the foundational elements of computational power, algorithms, or data, which form the trajectory for developing large models. Many of DeepSeek's innovations arise from confronting chip constraints; for instance, Nvidia's H100 has a communication bandwidth of 900GB per second, while the H800 is only 400GB per second, with DeepSeek constrained to training models on the H800.

Lately, I've reviewed various opinions on both sides of the Pacific regarding DeepSeek. The phrase, “necessity is the mother of invention,” originating from ancient Greece, has been echoed by numerous industry leaders. Conversely, it poses a thought-provoking dilemma: DeepSeek’s parity with OpenAI’s product has arisen from its algorithmic edge compensating for computational deficits. However, should OpenAI discover equally proficient algorithms while leveraging superior chips, will large model disparities widen once more?

On the flip side, while DeepSeek can adapt to domestically produced chips, performance disparities leave computational deficiencies unresolved in the short term. Unless we replicate the scene wherein electric vehicles overshadow fuel-based ones, overtaking the incumbent giants by adopting quantum chips over silicon-based processors could lead to a transformative leap.

Ultimately, engaging in this contemplation is tragically ironic—technological innovation should benefit all humankind, yet it is distorted by geopolitical factors. Therefore, we must celebrate DeepSeek’s unwavering commitment to the open-source movement.

What Innovations Has DeepSeek Introduced?

Rencent Posts

Categories