Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits

Institute of Information Processing, Leibniz University Hannover
Reinforcement Learning Conference 2025

We develop a framework for using Multi-Agent Reinforcement Learning (MARL) in a bandit setting for the inverse design of photonic integrated circuit (PIC) components with improved performance and robustness compared to traditional gradient-based methods. Above, a design task of a linear operation on a photonic integrated circuit is displayed. In the left image, 65% of the incoming light emitted by a source (yellow) in the left waveguide (blue) should be routed to the top right waveguide (blue), while 35% of the light should go to the bottom right waveguide (blue). Transmission is measured as the ratio between output- (green) and input-detector (pink). The design task is a binary optimization problem for choosing silicon or air at every voxel. In the middle image, an electromagnetic simulation of the design is shown. During optimization (right), gradient descent gets stuck in a local minimum, while our BPPO and BAC show better exploration behavior.

Abstract

Inverse design of photonic integrated circuits (PICs) has traditionally relied on gradient-based optimization. However, this approach is prone to end up in local minima, which results in suboptimal design functionality. As interest in PICs increases due to their potential for addressing modern hardware demands through optical computing, more adaptive optimization algorithms are needed. We present a reinforcement learning (RL) environment as well as multi-agent RL algorithms for the design of PICs. By discretizing the design space into a grid, we formulate the design task as an optimization problem with thousands of binary variables. We consider multiple two- and three-dimensional design tasks that represent PIC components for an optical computing system. By decomposing the design space into thousands of individual agents, our algorithms are able to optimize designs with only a few thousand environment samples. They outperform previous state-of-the-art gradient-based optimization in both two- and threedimensional design tasks. Our work may also serve as a benchmark for further exploration of sample-efficient RL for inverse design in photonics.

Topology Optimization with Reinforcement Learning

In topology optimization, the goal is to find the best material distribution for a given task. We discretize the design space and control each part with a separate agent, each faces a binary choice: place material or air at the respective chunk of the design space. This setting is similar to a hierarchical bandit where N agents choose k actions. However, the major challenge is sample efficiency. Evaluating the reward of a joint action from all agents requires a full electromagnetic simulation, which can take between multiple seconds up to multiple minutes. Therefore, multiple thousands of agents need to learn to cooperate using only few samples.

Results

We test the feasibility of MARL in several topology optimization tasks for optical computing. Our Bandit variants of PPO (BPPO) and the Actor-Critic approach (BAC) consistently outperform all baselines by a large margin. Most importantly, gradient based optimization (Grad) is the standard procedure in topology optimization, but is very prone to get stuck in local minima. Additionally, gradient-based optimization struggles with three-dimensional topology optimization, because the non-differentiable design constraints impede the optimization process. In contrast, MARL can easily incorporate these design constraints directly in the optimization process.

Video Presentation

BibTeX

@article{mahlau2025multi,
    title={Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits},
    author={Mahlau, Yannik and Schier, Maximilian and Reinders, Christoph and Schubert, Frederik and B{\"{u}}gling, Marco and Rosenhahn, Bodo},
    journal={Reinforcement Learning Journal},
    year={2025}
}