GSPO: Towards Scalable Reinforcement Learning for Language Models

qwenlm.github.io ReleasesInfra & hardwareResearch 1 min read

PAPER DISCORD Introduction Reinforcement Learning (RL) has emerged as a pivotal paradigm for scaling language models and enhancing their deep reasoning and problem-solving capabilities. To scale RL, the foremost prerequisite is maintaining stable and robust training dynamics. However, we observe that existing RL algorithms (such as GRPO) exhibit severe instability issues during long training and lead to irreversible model collapse, hindering further performance improvements with increased comput

Read the original on qwenlm.github.io

AI News Hub links to primary sources. This page shows the publisher's own title and excerpt with a link to the full article. We point you at the news; we don't rewrite it.