Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control

Published: by
Yuanzhu Zhan

Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control

Modeling and control of nonlinear dynamics are critical in robotics, especially in scenarios with unpredictable external influences and complex dynamics. Traditional cascaded modular control pipelines often yield suboptimal performance due to conservative assumptions and tedious parameter tuning. Pure data-driven approaches promise robust performance but suffer from low sample efficiency, sim-to-real gaps, and reliance on extensive datasets. Hybrid methods combining learning-based and traditional model-based control in an end-to-end manner offer a promising alternative. This work presents a self-supervised learning framework combining learning-based inertial odometry (IO) module and differentiable model predictive control (d-MPC) for Unmanned Aerial Vehicle (UAV) attitude control. The IO denoises raw IMU measurements and predicts UAV attitudes, which are then optimized by MPC for control actions in a bi-level optimization (BLO) setup, where the inner MPC optimizes control actions and the upper level minimizes discrepancy between real-world and predicted performance. The framework is thus end-to-end and can be trained in a self-supervised manner. This approach combines the strength of learning-based perception with the interpretable model-based control. Results show the effectiveness even under strong wind. It can simultaneously enhance both the MPC parameter learning and IMU prediction performance.

Approach Overview

The proposed framework

The IMU model predicts the current state. The d-MPC solves for the optimal action under lower-level cost L, which controls the dynamics model to the next state and actuates the real system to next state measured by the IMU. The upper-level U minimizes the discrepancy between next states solved by MPC and IMU measurement.

Experiment Results

The UAV attitude quickly returns to a stable hover state for an initial condition of 20° using our iMPC as the controller.

Control performance of our method under different levels of wind disturbance
UAV pitch angle response when encountering an impulse wind disturbance at 0.2s for different speeds
UAV pitch angle response when the encountering a step wind disturbance at 0.2s and lasting for 0.3~s for different speeds
Learned Dynamics Parameters

We also evaluate the learning performance in the d-MPC. In particular, we treat the UAV mass and moment of inertia (MOI) as the learnable parameters.

The learned UAV MOI and mass error using our method under different initial conditions with an initial value of 50% offset.
Initial Offset Error 10°Error 15°Error 20°Error
50% 0.96% 10°2.67% 15°3.41% 20°2.22%
50% 1.69% 10°0.85% 15°1.43% 20°0.32%

BibTeX

@inproceedings{He2025iMPC,
       author    = {Haonan He and Yuheng Qiu and Junyi Geng},
       title     = {Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control},
       booktitle = {Proceedings of the 7th Annual Learning for Dynamics & Control Conference},
       year      = {2025},
       pages     = {Accepted}
}