Decision Making and Control

Many outdoor enviroments are unstructured and spatiotemporal which can cause uncertain behavivors of robots. For example, the motion of aquatic vehicles in the water or aerial vehicles in the air can be highly uncertain/stochastic. The decision making (or planning under uncertainty) allows a robot to effectively reject action stochasticity caused by external disturbances.

Variants of Model Predictive Path Integral Control:

To address challening control problem, we haved developed a Uncented Model Predictive Path Integral (U-MPPI) control strategy, a new methodology that can effectively manage system uncertainties while integrating a more efficient trajectory sampling strategy. The core concept is to leverage the Unscented Transform (UT) to propagate not only the mean but also the covariance of the system dynamics, going beyond the traditional MPPI method. As a result, it introduces a novel and more efficient trajectory sampling strategy, significantly enhancing state-space exploration and ultimately reducing the risk of being trapped in local minima. Furthermore, by leveraging the uncertainty information provided by UT, we incorporate a risk-sensitive cost function that explicitly accounts for risk or uncertainty throughout the trajectory evaluation process, resulting in a more resilient control system capable of handling uncertain conditions.

YouTube link:

We also proposed a method called log-MPPI equipped with a more effective trajectory sampling distribution policy which significantly improves the trajectory feasibility in terms of satisfying system constraints. The key point is to draw the trajectory samples from the normal log-normal (NLN) mixture distribution, rather than from the Gaussian distribution. Furthermore, this work presents a method for collision-free navigation in unknown cluttered environments by incorporating the 2D occupancy grid map into the optimization problem of the sampling-based MPC algorithm.

YouTube link:

Variants of Markov Decision Processes:

We developed a framework called time-varying Markov Decision Process (TVMDP). The new framework does not need to increase state space or discretize time. Specifically, the TVMDP is built upon an upgraded transition model that varies both spatially and temporally, with an underlying computing mechanism that can be imagined as value iterations combining both spatial "expansion" and temporal "evolution". Such a framework is able to integrate a future horizon of environmental dynamics and produce highly accurate action policies under the spatio-temporal disturbances that are caused by, e.g, tidal and/or air turbulence.

Left: flow pattern near San Francisco Bay (by F. Baart et al). Middle: without considering the time-varying aspect of disturbance, the robot's trajectory makes unnecessary detours. Right: trajectory produced by a decision-making strategy that has integrated prediction of ocean currents near southern California.

State-Continuity Approximation of Markov Decision Processes: we also developed a solution to the MDP based decision-theoretic planning problem using a continuous approximation of the underlying discrete value functions. This approach allows us to obtain an accurate and continuous form of value function even with a small number of states from a very low resolution of state space. We achieved this by taking advantage of the second order Taylor expansion to approximate the value function, where the value function is modeled as a boundary-conditioned partial differential equation which can be naturally solved using a finite element method. Our extensive simulations and the evaluations reveal that our solution provides continuous value functions, leading to better path results in terms of path smoothness, travel distance and time costs, even with a smaller state space.

Left: MDP policy iteration with continuous value approximation by finite element analysis. Middle: MDP policy iteration with exact discrete policy iteration on high-resolution states (traditional). Right: Goal-oriented planner without motion uncertainty (policy) optimization.

Related Papers:

"Towards Efficient MPPI Trajectory Generation with Unscented Guidance: U-MPPI Control Strategy." Ihab S Mohamed, Junhong Xu, Gaurav Sukhatme, Lantao Liu. IEEE Transactions on Robotics (T-RO). 2025.

"GP-guided MPPI for Efficient Navigation in Complex Unknown Cluttered Environments." Ihab S. Mohamed, Mahmoud Ali, Lantao Liu. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2023.

"Autonomous Navigation of AGVs in Unknown Cluttered Environments: log-MPPI Control Strategy." Ihab S. Mohamed, Kai Yin, Lantao Liu. IEEE Robotics and Automation Letters (RA-L). 2022.

"Reachable Space Characterization of Markov Decision Processes with Time Variability". Junhong Xu, Kai Yin, Lantao Liu. Robotics: Science and Systems (RSS). Messe Freiburg, Germany. June, 2019.

"A Solution to Time-Varying Markov Decision Processes". Lantao Liu, Gaurav S. Sukhatme. IEEE Robotics and Automation Letters (RA-L). vol.3, no. 3. pp. 1631-1638, 2018.

"Action Learning for Coral Detection and Species Classification". Junhong Xu, Lantao Liu. The OCEANS Conference. Seattle, WA, 2019.

"Learning Partially Structured Environmental Dynamics for Marine Robotic Navigation". Chen Huang, Kai Yin, Lantao Liu. The OCEANS Conference. 2018.

"Reachability and Differential based Heuristics for Solving Markov Decision Processes". Shoubhik Debnath, Lantao Liu, Gaurav Sukhatme. International Symposium on Robotics Research (ISRR). Chile, 2017.)

"Solving Markov Decision Processes with Reachability Characterization from Mean First Passage Times". Shoubhik Debnath, Lantao Liu, Gaurav Sukhatme. IEEE/RSJInternational Conference on Intelligent Robots and Systems (IROS). Madrid, Spain, 2018.)