AI News Hub
← Back to the feed

Alibaba Qwen

QwQ-32B: Embracing the Power of Reinforcement Learning

qwenlm.github.io Infra & hardwareResearch QwenDeepSeek 1 min read

QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning. Our research explores the scalabili

AI News Hub links to primary sources. This page shows the publisher's own title and excerpt with a link to the full article — we point you at the news; we don't rewrite it.