
Alibaba Cloud on Thursday launched QwQ-32B, a compact reasoning model built on its latest large language model (LLM), Qwen2.5-32b, one it says delivers performance comparable to other large cutting edge models, including Chinese rival DeepSeek and OpenAI’s o1, with only 32 billion parameters.
According to a release from Alibaba, “the performance of QwQ-32B highlights the power of reinforcement learning (RL), the core technique behind the model, when applied to a robust foundation model like Qwen2.5-32B, which is pre-trained on extensive world knowledge. By leveraging continuous RL scaling, QwQ-32B demonstrates significant improvements in mathematical reasoning and coding proficiency.”
AWS defines RL as “a machine learning technique that trains software to make decisions to achieve the most optimal results and mimics the trial-and-error learning process that humans use to achieve their goals. Software actions that work towards your goal are reinforced, while actions that detract from the goal are ignored.”
“Additionally,” the release stated, “the model was trained using rewards from a general reward model and rule-based verifiers, enhancing its general capabilities. These include better instruction-following, alignment with human preferences, and improved agent performance.”
QwQ-32B is open-weight in Hugging Face and Model Scope under the Apache 2.0 license, according to an accompanying blog from Alibaba, which noted that QwQ-32B’s 32 billion parameters achieve “performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated).”
Its authors wrote, “this marks Qwen’s initial step in scaling RL to enhance reasoning capabilities. Through this journey, we have not only witnessed the immense potential of scaled RL but also recognized the untapped possibilities within pretrained language models.”
They went on to state, “as we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI). Additionally, we are actively exploring the integration of agents with RL to enable long-horizon reasoning, aiming to unlock greater intelligence with inference time scaling.”
Asked for his reaction to the launch, Justin St-Maurice, technical counselor at Info-Tech Research Group, said, “comparing these models is like comparing the performance of different teams at NASCAR. Yes, they are fast, but in every lap someone else is winning … so does it matter? Generally, with the commoditization of LLMs, it’s going to be more important to align models with actual use cases, like picking between a motorcycle and a bus, based on needs.”
St-Maurice added, “OpenAI is rumored to want to charge a $20K/month price tag for a ‘PhD intelligence’ (whatever that means), because it’s expensive to run. The high-performing models out of China challenge the assumption that LLMs need to be operationally expensive. The race to profitability is through optimization, not brute-force algorithms and half-trillion-dollar data centers.”
DeepSeek, he added, “says that everyone else is overpriced and underperforming, and there is some truth to that when efficiency drives competitive advantage. But, whether Chinese AI is ‘safe for the rest of the world’ is a different conversation entirely, as it depends on enterprise risk appetite, regulatory concerns, and how these models align with data governance policies.”
According to St-Maurice, “all models challenge ethical boundaries in different ways. For example, framing another LLM like North America’s Grok as inherently more ethical than China’s DeepSeek is increasingly ambiguous and a matter of opinion; it depends on who’s setting the standard and what lens you’re viewing it through.”
The third big player in Chinese AI is Baidu, which launched a model of its own named Ernie last year, although it has made little impact outside of China, a situation that St-Maurice said is not surprising.
“The website is still giving out responses in Chinese, even though it claims to support English,” he said. “It’s safe to say that Alibaba and DeepSeek are more focused on the global stage, while Baidu seems more domestically anchored. Different priorities, different outcomes.”