Alibaba Qwen
GSPO: Towards Scalable Reinforcement Learning for Language Models
qwenlm.github.io ReleasesInfra & hardwareResearch 1 min read
PAPER DISCORD Introduction Reinforcement Learning (RL) has emerged as a pivotal paradigm for scaling language models and enhancing their deep reasoning and problem-solving capabilities. To scale RL, the foremost prerequisite is maintaining stable and robust training dynamics. However, we observe that existing RL algorithms (such as GRPO) exhibit severe instability issues during long training and lead to irreversible model collapse, hindering further performance improvements with increased comput
AI News Hub links to primary sources. This page shows the publisher's own title and excerpt with a link to the full article — we point you at the news; we don't rewrite it.