From Space to Time: Enabling Adaptive Safety with Learned Value Functions via Disturbance Recasting

Abstract

The widespread deployment of autonomous systems in safety-critical environments such as urban air mobility hinges on ensuring reliable, performant, and safe operation under varying environmental conditions. One such approach, value function-based safety filters, minimally modifies a nominal controller to ensure safety. Recent advances leverage offline learned value functions to scale these safety filters to high-dimensional systems. However, these methods assume detailed priors on all possible sources of model mismatch, in the form of disturbances in the environment -- information that is rarely available in real world settings. Even in well-mapped environments like urban canyons or industrial sites, drones encounter complex, spatially-varying disturbances arising from payload-drone interaction, turbulent airflow, and other environmental factors. We introduce Space2Time, which enables safe and adaptive deployment of offline-learned safety filters under unknown, spatially-varying disturbances. The key idea is to reparameterize spatial variations in disturbance as temporal variations, enabling the use of precomputed value functions during online operation. We validate Space2Time on a quadcopter through extensive simulations and hardware experiments, demonstrating significant improvement over baselines.

Offline formulation

We solve a reach-avoid problem for a system with time-varying disturbances to learn a safety value function. The disturbance is modeled as being time-varying (here linear for simplicity), reaching its maximum value at the end of the horizon, i.e. at the initial time $t = 0$.

$$\mathcal{D}_\text{tv}(\dot{d},t)=\big\{d\in \mathbb{R} \mid \lvert d \rvert \leq \max\{0, d_{\max} - \lvert t \rvert \dot{d}\}\big\}$$

$$\dot{z} = \hat{f}(z,u, \eta) = \begin{bmatrix} \dot{x} \\ \ddot{d} \end{bmatrix} = \begin{bmatrix} f(x)+g(x)u+ \eta\\ 0 \end{bmatrix}$$

The target set $\mathcal{T}$, with, target function $l$ such that $\mathcal{T}:=\{x\in \mathbb{R}^n \mid l(x) \geq 0\}$ is chosen such that it is control invariant under worst-case disturbance, and the failure set $\mathcal{F}$ is such that the constraint function $g$ describes the obstacles, $\mathcal{F}:=\{x\in \mathbb{R}^n \mid g(x) \leq 0\}$.

The associated reach-avoid cost function is then defined as: $r_{\text{RA}}(x,t,\mathbf{u}, \mathbf{d}) = \max_{\tau \in [t,0]}\min\{l(\mathbf{x}_{x,t}^{\mathbf{u},\mathbf{d}}(\tau)), \min_{s \in [t,\tau]}g(\mathbf{x}_{x,t}^{\mathbf{u}, \mathbf{d}}(s))\}$ . The value function is then: $V(x,t)=\min_\limits{d \in \mathcal{D}}\max_\limits{u\in\mathcal{U}}r_{\text{RA}}(x,t,\mathbf{u},\mathbf{d})$

Online implementation

We design a safety filter based on an ensemble of offline reach-avoid value functions for our system with time-varying disturbances to ensure safety for a system with spatially-varying disturbances.

We assume a setting where the disturbance and its directional derivative with respect to the dynamics can be estimated online. To ensure safety, we store the past H estimated values for the directional derivative and select the maximum as our sample. Together with the current estimated disturbance, we can now define the time to return to the control invariant set.

The time-varying CBF is used within the safety filter to ensure safety, with $E = \mathcal{D}_{\text{tv}}(\overline{D_{\tilde{f}}d}, t_\text{return})$.

Hardware experiments

Baseline

Ours (Space2Time)

Citation

If you use our method or code in your research, please consider citing the paper as follows:

@inproceedings{TonkensShinde2025,
  author    = {Tonkens, S. and Shinde, N. U. and Begzadić, A. and Yip, M. C. and Cortés, J. and Herbert, S.},
  title     = {From Space to Time: Enabling Adaptive Safety with Learned Value Functions via Disturbance Recasting},
  booktitle = {Conference on Robot Learning},
  year      = {2025},
}