AI Chips News

Huawei claims better AI training method than DeepSeek using own Ascend chips

huawei ai training ascend chips

Huawei claims better AI training method than DeepSeek using own Ascend chips

Huawei’s progress in AI model architecture could prove significant, as the company seeks to reduce its reliance on US technologies

Researchers working on Huawei Technologies’ large language model (LLM) Pangu claimed they have improved on DeepSeek’s original approach to training artificial intelligence (AI) by leveraging the US-sanctioned company’s proprietary hardware.

A paper – published last week by Huawei’s Pangu team, which comprises 22 core contributors and 56 additional researchers – introduced the concept of Mixture of Grouped Experts (MoGE). It is an upgraded version of the Mixture of Experts (MoE) technique that has been instrumental in DeepSeek’s cost-effective AI models.

While MoE offers low execution costs for large model parameters and enhanced learning capacity, it often results in inefficiencies, according to the paper. This is because of the uneven activation of so-called experts, which can hinder performance when running on multiple devices in parallel.

In contrast, the improved MoGE

Researchers, said:

Groups the experts during selection and better balances the expert workload

In AI training, “experts” refer to specialised sub-models or components within a larger model, each designed to handle specific tasks or types of data. This allows the overall system to take advantage of diverse expertise to enhance performance.

The advancement comes at a crucial time, as Chinese AI companies are focused on enhancing model training and inference efficiency through algorithmic improvements and a synergy of hardware and software, despite US restrictions on the export of advanced AI chips like those from Nvidia.

Researchers at Huawei tested the new architecture on its Ascend neural processing unit (NPU) designed to accelerate AI tasks, and found that MoGE “leads to better expert load balancing and more efficient execution for both model training and inference”.

Compared to models like DeepSeek-V3, Alibaba Group Holding’s Qwen2.5-72B and Meta Platforms’ Llama-405B, Pangu achieved state-of-the-art performance on most general English benchmarks and all Chinese benchmarks, and showed higher efficiency in long-context training, according to the paper.

Alibaba owns the South China Morning Post.

Pangu also excelled in general language-comprehension tasks, particularly in reasoning tasks, the Huawei researchers said.

Huawei’s progress in AI model architecture could prove significant, as the Shenzhen-based company seeks to reduce its reliance on US technologies amid ongoing sanctions. Its Ascend chips are considered domestic alternatives to some Nvidia processors.

Pangu Ultra, an LLM with 135 billion parameters that is optimised for NPUs, highlights the effectiveness of Huawei’s architectural and systemic optimisations while showcasing the capabilities of its NPUs.

According to Huawei, the training process includes three main stages: pre-training, long context extension and post-training. This involves pre-training on 13.2 trillion tokens and long context extension using 8,192 Ascend chips.

Researchers said the model and system would soon be available to Huawei’s commercial customers.

READ the latest news shaping the AI Chips market at AI Chips News

Huawei claims better AI training method than DeepSeek using own Ascend chips, source

Add comment

Follow us on LinkedIn!

Market News

🤖 aichipsnews.com – AI Chips

🔋 batteriesnews.com – Batteries

🍀 biofuelscentral.com – Biofuels

👩‍💻 datacentrecentral.com – Data Center

💧 hydrogen-central.com – Hydrogen

👁️ newsvidia.com – Nvidia

Join our weekly newsletter!

Please enable JavaScript in your browser to complete this form.

Your Header Sidebar area is currently empty. Hurry up and add some widgets.