DeepSeek releases DeepGEMM: an efficient FP8 GEMM library that optimizes V3/R1 training and inference

PA一线｜2025-02-26 9:41

PANews reported on February 26 that DeepSeek launched DeepGEMM on the third day of its OpenSourceWeek, a CUDA library that supports FP8 GEMM and can be used for dense matrix calculations and mixture of experts (MoE) architecture to optimize the training and inference of V3/R1 models.

DeepGEMM key features:

• Ultra-high performance: 1350+ FP8 TFLOPS on Hopper GPU

• Minimal dependencies: no heavy dependencies, simple code like tutorials

• JIT compilation: no need for pre-compilation, automatic optimization at runtime

• The core code is only about 300 lines, but outperforms expert-optimized kernels for most matrix sizes

• Support dense layout and two MoE layouts

Author：PA一线
This content is only to provide market information and does not constitute investment advice.

Tutorial DeepSeek

Comment

Recommend Reading

Selected Features More

Pioneer's View: Crypto Celebrity Interviews

Pioneer's View: Crypto Celebrity Interviews

Exclusive interviews with crypto celebrities, sharing unique observations and insights

PAData: Web3 in Data

PAData: Web3 in Data

Data analysis and visualization reporting of industry hot spots

Memecoin Supercycle: The hype around attention tokenization

Memecoin Supercycle: The hype around attention tokenization

From joke culture to the trillion-dollar race, Memecoin has become an integral part of the crypto market. In this Memecoin super cycle, how can we seize the opportunity?

AI Agent: A Journey to Web3

AI Agent: A Journey to Web3

The AI Agen innovation wave is sweeping the world. How will it take root in Web3? Let’s embark on this adventure together!