GPC: Large-Scale Generative Pretraining for Transferable Motor Control

Shi, Yi; Jiang, Yifeng; Tessler, Chen; Peng, Xue Bin

GPC: Large-Scale Generative Pretraining for Transferable Motor Control

Yi Shi^1,2, Yifeng Jiang¹, Chen Tessler¹, Xue Bin Peng^1,2

¹NVIDIA ²Simon Fraser University
SIGGRAPH 2026

Overview

GPC is a framework for training generative controllers on large-scale motion datasets. The framework consists of three stages. In the first stage, Skill Quantization, we construct a discrete latent representation of skills using Finite Scalar Quantization (FSQ). The latent codes are optimized directly with end-to-end RL, providing a robust representation for subsequent generative controller training and downstream task adaptation. In the second stage, Generative Controller Training, we train a generative controller that models the distribution of discrete skill tokens using a transformer decoder. Finally, we introduce CoLA, a parameter-efficient fine-tuning (PEFT) method that can adapt the pretrained generative controller to a wide range of downstream control tasks using only a small number of additional parameters, while preserving the diversity and naturalness of the pretrained controller.

Skill Quantization

1 Scales up to a 600 h dataset with 99.98% success rate

2 Simple end-to-end training with PPO

Training GPC requires a discrete latent representation of motor skills. This is obtained by training an FSQ motion-tracking controller, where the discrete codes are optimized via end-to-end RL to model different skills. We incorporate FSQ-based quantization into the policy through an encoder–decoder architecture. The encoder maps a sequence of target states to a continuous latent vector. Each dimension of this latent is then independently quantized using FSQ into $L$ fixed scalar levels: This look-up-free quantization defines an implicit discrete codebook of size $L^d$, alleviating the need to learn an explicit codebook, as done in VQ-VAEs.

Results Reproduced by the FSQ Tracking Controller

Comparison with Other Methods

Generative Controller

1 Robust unconditional generation across diverse skills

2 Emergent human-like responses to external perturbations

To model the skill representation at each time step, the joint conditional distribution of the token sequence is captured autoregressively using a GPT-style transformer decoder with causal self-attention. This architecture ensures that each token is generated based on the character's state and all previously generated tokens. During inference, tokens are autoregressively sampled via nucleus (top-$p$) sampling applied to the softmax-normalized logits, which restricts choices to the most likely tokens to mitigate low-probability outliers while preserving diverse behavior generation. Crucially, the resulting learned policy is robust and capable of learning recovery skills without explicit training procedures.

Unconditional Sampling

Robustness and Recovery Behavior Under Perturbation

Task Adaptation

1 Lightweight adaptation, adding <1% extra parameters

2 Leverages pretrained skills across versatile downstream tasks

3 Supports user-designated skills for task completion

To efficiently adapt a Generative Pretrained Controller (GPC) to new tasks, we introduce Conditional Low-rank Adaptation (CoLA), a Parameter-Efficient Fine-Tuning (PEFT) method inspired by DoRA that adds less than 1% of new parameters. This lightweight adaptation is driven by a two-stage pipeline:

RLFT: Task adaptation is performed via reinforcement learning fine-tuning (RLFT) by optimizing task-specific reward functions. This stage refines the model to reliably compose and execute skills for the downstream tasks. During rollout, exploration is regularized using Nucleus Sampling guided by the unconditional generative model.
SFT+RLFT: When example motions are available and the user wants to leverage those specific skills, Supervised Fine-Tuning(SFT) can serve as an initial guide for stylization. This biases the model toward selecting task-appropriate, user-designated skills, preventing it from exploring from scratch or activating unwanted behaviors from its vast pretrained repertoire. The model can then be further refined through reinforcement learning fine-tuning (RLFT) to reliably compose and execute these skills for task completion.

RLFT-Only Adaptation

SFT + RLFT Adaptation (Stylization)

Perserved Robustness and Recovery Skills

HSI Task

We demonstrate GPC on a Human-Scene Interaction (HSI) task, where a character must navigate environments and interact naturally with terrain. GPC is able to compose learned motor skills to handle diverse parkour-style scenarios without skill-specific training from scratch.

🎮 We have an real-time interactive demo showcased at SIGGRAPH 2026 — see you there!

Acknowledgments

We thank Ziyu Zhang, Yuxuan Mu, Dun Yang, Kaifeng Zhao, Sunmin Lee, Haotian Zhang, Gal Chechik, Davis Rempe and Sanja Fidler for their support and insightful discussions.

BibTeX

@inproceedings{shi2026gpc,
  title={GPC: Large-Scale Generative Pretraining for Transferable Motor Control},
  author={Shi, Yi and Jiang, Yifeng and Tessler, Chen and Peng, Xue Bin},
  booktitle={SIGGRAPH '26 Conference Proceedings},
  year={2026}
}