Documentation Index
Fetch the complete documentation index at: https://openpipe-art-austin-megatron-models.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
GSPO is an experimental feature. The API and behavior may change in future releases.
Overview
GSPO was introduced by the Qwen team to train state-of-the-art models including Qwen3-235B-A22B-Instruct-2507. It can improve training stability and efficiency for Mixture-of-Experts (MoE) models, and may have limited or no impact for dense models.Key Benefits
- Stable Training: Maintains stable training processes and resolves stability challenges in large MoE models
- Efficient Scaling: Achieves higher training efficiency and continues improving with increased computational resources
- Infrastructure-Friendly: More tolerant of precision discrepancies, eliminating the need for complex strategies like “Routing Replay”
How It Works
GSPO’s core innovation is its sequence-level optimization objective. Instead of focusing on individual token likelihoods, GSPO defines importance ratios based on the sequence likelihood with length normalization to reduce variance. The algorithm optimizes:sᵢ(θ) is defined as:
Configuration
GSPO can be configured using theimportance_sampling_level parameter when training with ART:
Technical Details
For a deeper understanding of GSPO’s technical foundations and comparative analysis with other RL algorithms, see the original research paper.Limitations
- As an experimental feature, GSPO may have limited compatibility with some model architectures
- Performance characteristics may vary depending on model size and dataset
- API is subject to change in future releases