We solve a reach-avoid problem for a system with time-varying disturbances to learn a safety value function.
The disturbance is modeled as being time-varying (here linear for simplicity), reaching its maximum value at the end of the horizon, i.e. at the initial time \(t = 0\).
$$\mathcal{D}_\text{tv}(\dot{d},t)=\big\{d\in \mathbb{R} \mid \lvert d \rvert \leq \max\{0, d_{\max} - \lvert t \rvert \dot{d}\}\big\}$$
$$\dot{z} = \hat{f}(z,u, \eta) =
\begin{bmatrix}
\dot{x} \\ \ddot{d}
\end{bmatrix}
=
\begin{bmatrix}
f(x)+g(x)u+ \eta\\ 0
\end{bmatrix}$$
The target set \(\mathcal{T}\), with, target function \(l\) such that \(\mathcal{T}:=\{x\in \mathbb{R}^n \mid l(x) \geq 0\}\) is chosen such that it is control invariant under worst-case disturbance, and the failure set \(\mathcal{F}\) is such that the constraint function \(g\) describes the obstacles, \(\mathcal{F}:=\{x\in \mathbb{R}^n \mid g(x) \leq 0\}\).
The associated reach-avoid cost function is then defined as: \(r_{\text{RA}}(x,t,\mathbf{u}, \mathbf{d}) = \max_{\tau \in [t,0]}\min\{l(\mathbf{x}_{x,t}^{\mathbf{u},\mathbf{d}}(\tau)), \min_{s \in [t,\tau]}g(\mathbf{x}_{x,t}^{\mathbf{u}, \mathbf{d}}(s))\}\)
.
The value function is then: \(V(x,t)=\min_\limits{d \in \mathcal{D}}\max_\limits{u\in\mathcal{U}}r_{\text{RA}}(x,t,\mathbf{u},\mathbf{d})\)