Automated Spectral-Based Distribution for Ergodic Heterogeneous Multi-Agent Coverage and Search

This work uses deep reinforcement learning techniques to extend ergodic search processes to heterogeneous multi-agent coverage and search. We build upon recent results in ergodic coverage in order to allow agents to learn where their strengths (as determined by their sensing and motion capabilities) lie, and allocate search subtasks to agents accordingly. Specifically, we use deep reinforcement learning techniques to automatically identify and leverage synergies between agents with different sensor and motion models, in order to optimize for coverage of the region.

Automating the distribution of agents to search subtasks both eliminates the overhead of handcrafting a mapping, and allows for ergodic search approaches to extend to more complex agents and larger teams with diverse compositions. We use the ergodic metric, which directs agents to spend time in regions of the search domain in proportion to the probability of locating targets in that region, while still exploring the entire domain, thereby balancing exploration and exploitation, to drive this optimization.

This work investigates three different approaches to autonomously creating search subtasks and distributing agents to these spectral scales. The first method maps agents to pre-defined search subtasks, i.e. spectral bands. In the second and third methods, search subtasks are formulated and allocated to agents by learning weight distributions over the spectral coefficients of the information map, directly and through parameterized curves respectively.

Our numerical results show that distributing and assigning coverage responsibilities to agents depending on their dynamic sensing capabilities leads to 40%, 51% and 46% improvement with regard to standard coverage metric (ergodicity), and 15%, 22% and 20% improvement in time to find all targets, for the three methods respectively.

Research Team: Ananya Rao

Pipeline for autonomous assignment of agents in multi-agent search scenario involving two types of agents: differential-drive agents with short-range, high-fidelity sensors, and omnidirectional agents with long-range, low-fidelity sensors. The underlying distribution shows the likelihood of finding targets throughout the domain.

Example trajectories for two agent teams without distribution (middle) and with learned distributions (right), and the weights over spectral coefficients (left) for method 1 (top), method 2 (middle), and method 3 (bottom).

Search performance comparison between the different agent distribution methods - pre-determined spectral bands (Method 1), learning weights over all coefficients (Method 2),learning parameterized curves over coefficients (Method 3) - and the baseline (No Distribution),in terms of coverage performance (using the ergodic metric, lower is better)and time to find all targets (lower is better).