
Enrico Marchesini
Assistant Professor
Safety is a critical concern in multi-agent reinforcement learning (MARL), yet typical safety-aware methods constrain agent behaviors, limiting exploration-essential for discovering effective cooperation. Existing approaches mainly enforce individual constraints, overlooking potential benefits of joint (team) constraints. We analyze team constraints theoretically and practically, introducing entropic exploration for constrained MARL (E2C). E2C maximizes observation entropy to encourage exploration while ensuring safety at the individual and team levels. Experiments across diverse domains demonstrate that E2C matches or outperforms common baselines in task performance while reducing unsafe behaviors by up to 50%.

Assistant Professor
INQUIRY -