"Forced-to-switch-initially" movies

Here we show how we pick desirable initial configurations that would yield most difference between \( \alpha_* \) and \( \mu_* \).

The idea is simple: We've realized that \( \alpha_* \) is better if it can reduce the number of possible tack-switches. Therefore, we

  • Compute another value function (deonted \( \tilde{w} \) ) that encodes the best probability of reaching the target within deadline \(s\) under the following new constraint:
    • Force an initial tack-switch at any (\(r,\theta,s\)) and then behave optimally
  • Compute the difference in switchgrid (denoted \( D_* \)) between \( \alpha_* \) and \( \mu_* \) at any \(s\). By "difference", we meant
    • Risk-aware (RA) policy prescribes "switching" while the risk-neutral (RN) policy doesn't.
    • RN policy prescribes "switching" while the RA policy doesn't.
  • Compute \( w -\tilde{w} \) on \( D_* \) for all \( 0 \le s \le \bar{s} \)
  • Find states where \( w -\tilde{w} \) is large on \( D_* \). (In most cases, we want RN prescribes "switching" while RA doesn't.)

Moives:

The follownig 4 movies all show contour plots (starboard tack ( \( q = 1 \) ) only) in the relative (\(r, \theta \))-space where within each frame:

    \( a = 0 \) and \( \sigma = 0.05 \) (\( 0 \le s \le 60 \)):
    \( a = 0.15 \) and \( \sigma = 0.05 \) (\( 0 \le s \le 50 \)):
    \( a = 0.05 \) and \( \sigma = 0.05 \) (\( 0 \le s \le 50 \)):
    \( a = 0.05 \) and \( \sigma = 0.1 \) (\( 0 \le s \le 50 \)):

« Return to Examples & Movies page Go to moives of policy/value function evolution » Go to sample paths visualizations »