Alibaba Qwen
Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters
qwenlm.github.io ReleasesResearch Mistral 1 min read
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction Since the surge in interest sparked by Mixtral, research on mixture-of-expert (MoE) models has gained significant momentum. Both researchers and practitioners are keenly interested in understanding how to effectively train such models and assessing their efficiency and effectiveness. Today, we introduce Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models
AI News Hub links to primary sources. This page shows the publisher's own title and excerpt with a link to the full article — we point you at the news; we don't rewrite it.