Inverse design of photonic integrated circuits (PICs) has traditionally relied on gradient-based optimization. However, this approach is prone to end up in local minima, which results in suboptimal design functionality. As interest in PICs increases due to their potential for addressing modern hardware demands through optical computing, more adaptive optimization algorithms are needed. We present a reinforcement learning (RL) environment as well as multi-agent RL algorithms for the design of PICs. By discretizing the design space into a grid, we formulate the design task as an optimization problem with thousands of binary variables. We consider multiple two- and three-dimensional design tasks that represent PIC components for an optical computing system. By decomposing the design space into thousands of individual agents, our algorithms are able to optimize designs with only a few thousand environment samples. They outperform previous state-of-the-art gradient-based optimization in both two- and threedimensional design tasks. Our work may also serve as a benchmark for further exploration of sample-efficient RL for inverse design in photonics.
In topology optimization, the goal is to find the best material distribution for a given task. We discretize the design space and control each part with a separate agent, each faces a binary choice: place material or air at the respective chunk of the design space. This setting is similar to a hierarchical bandit where N agents choose k actions. However, the major challenge is sample efficiency. Evaluating the reward of a joint action from all agents requires a full electromagnetic simulation, which can take between multiple seconds up to multiple minutes. Therefore, multiple thousands of agents need to learn to cooperate using only few samples.
We test the feasibility of MARL in several topology optimization tasks for optical computing. Our Bandit variants of PPO (BPPO) and the Actor-Critic approach (BAC) consistently outperform all baselines by a large margin. Most importantly, gradient based optimization (Grad) is the standard procedure in topology optimization, but is very prone to get stuck in local minima. Additionally, gradient-based optimization struggles with three-dimensional topology optimization, because the non-differentiable design constraints impede the optimization process. In contrast, MARL can easily incorporate these design constraints directly in the optimization process.
@article{mahlau2025multi,
title={Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits},
author={Mahlau, Yannik and Schier, Maximilian and Reinders, Christoph and Schubert, Frederik and B{\"{u}}gling, Marco and Rosenhahn, Bodo},
journal={Reinforcement Learning Journal},
year={2025}
}