Scalable and General Whole-Body Control for Cross-Humanoid Locomotion

Published:

Humanoid robots are among the most versatile forms for interacting with a world designed for humans. However, teaching these robots to move gracefully has traditionally been a fragmented process. Most state-of-the-art controllers are “specialists”—finely tuned to the specific limb lengths, joint limits, and weight distribution of a single robot model. When the hardware changes, the learning process often has to start from scratch.

zero-shot generalization
Zero-shot generalization and real-world humanoid capabilities enabled by XHugWBC’s generalist policy First row: Robust zero-shot generalization across seven humanoids with diverse DoFs, dynamic characteristics, and morphological structures; Second row: flexible teleoperation using XHugWBC enables long-horizon whole-body loco-manipulation tasks.

Today, we are introducing XHugWBC, a novel cross-embodiment training framework that moves us away from robot-specific tuning and toward a “generalist” controller for humanoid locomotion. For the first time, we demonstrate that a single neural network policy can control a vast array of humanoid designs and even generalize to entirely new robots it has never seen before.

The Challenge of Cross-Embodiment

In computer vision or natural language processing, models benefit from “scaling laws”—more data across diverse domains leads to better generalization. In robotics, scaling is harder because physics is unforgiving. A control strategy that works for a lightweight, agile humanoid might cause a heavier, more top-heavy robot to collapse.

To bridge this gap, a controller must understand not just “how to walk,” but the fundamental relationship between morphology (structure) and dynamics (motion).

Introducing XHugWBC

XHugWBC (Cross-Humanoid Whole-Body Control) achieves generalist capability through three core technical pillars:

  1. Physics-Consistent Morphological Randomization: Instead of training on a static robot model, we train our agents in a simulator where the robot’s very “DNA”—the lengths of its limbs, the mass of its torso, and the strength of its motors—constantly shifts. Crucially, these changes are physics-consistent, ensuring the agent learns a robust internal model of how different bodies react to gravity and momentum.
  2. Semantically Aligned Spaces: To enable one policy to talk to many robots, we developed a unified way for the model to “see” its state and “act” on its joints. By aligning the observation and action spaces across different robots, we provide a common language for locomotion.
  3. Morphology-Aware Policy Architecture: Our architecture doesn’t just process sensor data; it explicitly models the morphological and dynamical properties of the robot it is currently controlling. This allows the policy to adapt its strategy in real-time based on the specific body it inhabits.
Training framework of XHugWBC
Training framework of XHugWBC. (a) Data generation: physics-consistent morphological randomization produces diverse and physically meaningful embodiments. (b) Universal embodiment representation: robot-specific states are projected into a global joint space, upon which an embodiment graph is constructed. (c) Policy learning: the generalist policy uses a GCN- or Transformer-based encoder together with a state estimator. Deployment: the learned policy generalizes to seven humanoid robots with different kinematic, dynamic, and morphological structures in zero-shot.

From Simulation to the Real World

The true test of a generalist model is its ability to handle “out-of-distribution” scenarios. In our experiments, XHugWBC was trained on a diverse set of randomized embodiments and then tested on twelve simulated humanoids and seven different real-world robots.

The results were striking. The universal controller achieved zero-shot transfer, meaning it could walk on hardware it had never encountered during training without any additional fine-tuning. Whether the robot was short, tall, heavy, or slim, XHugWBC maintained stable, fluid locomotion, even when subjected to external pushes or uneven terrain.

Why This Matters

The development of XHugWBC marks a shift in how we think about robot intelligence. By decoupling the control software from the specific hardware, we can:

  • Accelerate Deployment: New humanoid prototypes can be up and running in minutes rather than months.
  • Increase Robustness: Models trained on a wide distribution of bodies are inherently more resilient to hardware wear-and-tear or sensor noise.
  • Scale Robotics Data: We can now aggregate data from many different types of robots to train a single, more powerful “foundation model” for motion.

Looking Ahead

While XHugWBC focuses on locomotion, the principles of cross-embodiment learning open the door to a future where a single “brain” could control a variety of robotic forms—from bipeds to quadrupeds and beyond. We are continuing to explore how these generalist policies can be extended to complex manipulation tasks, bringing us one step closer to truly versatile, general-purpose autonomous assistants.

For more details, read the full paper on arXiv.

Cite

@misc{xue2026scalablegeneralwholebodycontrol,
      title={Scalable and General Whole-Body Control for Cross-Humanoid Locomotion}, 
      author={Yufei Xue and YunFeng Lin and Wentao Dong and Yang Tang and Jingbo Wang and Jiangmiao Pang and Ming Zhou and Minghuan Liu and Weinan Zhang},
      year={2026},
      eprint={2602.05791},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.05791}, 
}