The research, titled "All-optical graph representation learning using integrated diffractive photonic computing units", was published in "Science Advances" on June 15, 2022.

2024/04/2816:32:33 science 1390

Editor | Radish Skin

Photonic neural networks perform brain-inspired calculations using photons instead of electrons to achieve significantly improved computing performance. However, the existing architecture can only handle data with regular structure and cannot be generalized to graph-structured data beyond Euclidean space .

Researchers at Tsinghua University proposed Diffraction Graph Neural Network (DGNN), an all-optical graph representation learning architecture based on a Diffraction Photon Computing Unit (DPU) and on-chip optics, to address this limitation. Specifically, graph node properties are encoded into strip optical waveguides, converted by DPU, and aggregated by optical couplers to extract their feature representations.

DGNN captures complex dependencies between node neighborhoods during the passage of light-speed optical messages through graph structures. They demonstrated the application of DGNN to node and graph-level classification tasks via a benchmark database and achieved superior performance. The research team's work opens a new direction for using deep learning to design specialized integrated photonic circuits for efficient processing of large-scale graph data structures.

The research is titled "All-optical graph representation learning using integrated diffractive photonic comPuting units" and was published in "Science Advances" on June 15, 2022.

The research, titled

Deep learning technology has made tremendous progress in a wide range of artificial intelligence (AI) applications, including computer vision , speech recognition, natural language processing, autonomous vehicles, biomedical sciences, and more. Its core is the use of multi-layer neural networks to learn hierarchical and complex abstractions from big data, which is achieved by integrated electronic computing platforms such as central processing units , graphics processing units (GPU), tensor processing units and field programmable gates. driven by the continuous development of arrays).

However, electronic computing performance is approaching the physical limit and it is difficult to keep up with the growth in the development needs of artificial intelligence. This is a common dilemma in a wide range of applications that require large-scale deep neural models.

In recent years, the research on photon computing has attracted more and more attention. photon is used as the computing medium, and its advanced characteristics of high parallelism, lowest power consumption and light speed signal processing are used to build photonic neural networks. Many photonic neural network architectures have been proposed to facilitate complex neurally inspired computations, such as diffractive neural networks, optical interference neural networks, photon spiking neural networks, and photonic storage computations.

Existing architectures are most successful in processing regularly structured data in the form of vector or grid-like images. However, various scientific fields analyze data beyond this basic Euclidean realm. As a typical representative, graph-structured data encoding rich relationships (i.e., edges) between entities (i.e., nodes) in complex systems are ubiquitous in the real world, ranging from chemical molecules to brain networks.

To process graph-structured data, graph neural networks (GNN) have evolved into a broad class of new methods capable of integrating local node features and graph topology for representation learning.

Among these models, message-passing-based GNN has the main advantages of flexibility and efficiency by generating neural messages at graph nodes and passing edges to their neighbors for feature updates. It has been successfully used in many graph-based applications, including molecular property prediction, drug discovery, skeleton-based human action recognition, spatiotemporal prediction, etc. However, how to effectively exploit photonic computing to benefit graph-based deep learning remains largely unexplored.

Here, researchers propose diffractive GNN (DGNN), a novel photonic GNN architecture that can perform optical message passing on graph-structured data. DGNN is built on an integrated Diffractive Photon Computing Unit (DPU) for generating optical node features. Each DPU includes a continuous diffraction layer implemented with metal wires to convert node properties into optical neural information, where strip optical waveguides are deployed to encode input node properties and output the conversion results.Optical neural information sent from node neighborhoods is aggregated using optical couplers.

In the DGNN architecture, DPUs can be cascaded horizontally to expand the receptive field, thereby capturing complex dependencies from adjacent nodes of any size. Additionally, DPUs can be stacked vertically to extract higher-dimensional optical node features to improve their learning capabilities, which is inspired by the multi-head strategy used in many modern deep learning models such as Transformers and graph attention networks.

Building on this scalable optical messaging scheme, the researchers first demonstrated a semi-supervised node classification task, in which optical node features extracted by DGNN were fed into an optical or electronic output classifier to determine the node class. Results show that the optical DGNN achieves competitive or even superior classification performance on synthetic graph models and electronic GNNs on three real-world graph benchmark datasets.

Additionally, DGNN also supports graph-level classification, where an additional DPU is used to aggregate all-optical node features into a graph-level representation for classification. Results on skeleton-based human action recognition demonstrate the effectiveness of this architecture for graph classification tasks.

The research, titled

Illustration: The architecture of optical DGNN. (Source: paper)

Scarce training labels

The researchers analyzed the effectiveness of DGNN when the training label size is limited, a common situation in semi-supervised learning. Under the same architectural settings, they compared the performance of DGNN against electronic model baselines under different sizes of training labels, including 1, 5, 10, 15, 20, and 25 labels per class.

Plotted test accuracy bar chart with error bars by evaluating the training labels 10 times for each size. Binarizing the diffractive modulation layer helps overcome local minima problems during network training and improves classification accuracy. The DGNN architecture outperforms all baselines in all label-scarcity settings, especially at smaller training set sizes, such as only one label per class, indicating higher generalization capabilities relative to other in silico methods.

The research, titled

Illustration: Classification on Amazon Photo with scarce training labels. (Source: Paper)

DPU with tapered output waveguide

The tapered waveguide is used to couple the output light field over a larger area to the output port of the integrated DPU. The improved coupling efficiency of the tapered output waveguide enables the photodetector to receive more optical power and improve the signal-to-noise ratio (SNR) during photoelectric conversion. A higher signal-to-noise ratio provides a higher-quality input signal to the classifier and ensures the stability of the classification task.

Quantitative evaluation of output energy distribution and model performance of tapered and single-mode waveguides. Researchers used FDTD to evaluate the power distribution of optical features on synthetic SBM plot test nodes and the trained DGNN-E model. The starting core width of the tapered output waveguide was optimized and set to 2 μm instead of 500 nm used in single-mode output waveguides. For each test graph node, the power transmission rate of the DPU is obtained by calculating the ratio of the output power of the two ports relative to the input light source power, thereby obtaining a frequency histogram of the transmission rate on all graph nodes. The average power transfer rate of the DPU with tapered output waveguide is 2.01%, which is about 5.6 times higher than the 0.36% of the single-mode output waveguide.

Using the estimated power transfer rate of the DPU, the researchers evaluated the photocurrent SNR of the on-chip photodetector, using the formula detailed in Materials and Methods, at different input light source powers. Meanwhile, the test accuracy of the DGNN-E model in terms of SNR was further evaluated, with the top-k neighboring nodes set to 16, by including photodetector noise into the node features and retraining the e-classifier (Figure S6D). Increasing the input light source power and the power transfer rate of the DPU improves the photocurrent SNR and achieves more stable model performance on the synthetic SBM plots.

The research, titled

Illustration: Semi-supervised node classification on three benchmark graph databases.(Source: paper)

In this work, the PPRGo model with single-round message passing is employed to directly capture high-order neighborhood information. The calculated energy efficiency of DGNN is calculated based on the input light source power of 10 mW. In the tapered and single-mode output waveguides, the photocurrent signal-to-noise ratio of DGNN reaches 34.6 and 20.2 dB respectively. The corresponding model test accuracy is 94.4% and 92.3%.

DPU calculation accuracy

quantization bits determine the DPU calculation accuracy, which can be inferred from the photocurrent SNR. In digital signal processing, quantization errors are introduced during the quantization process of the analog-to-digital converter. Assuming that the signal has a uniform distribution covering all quantization levels, the signal quantization-to-noise ratio can be expressed as SQNR = 20log10(2Q) = 6.02Q dB, where Q represents the number of quantization bits. Therefore, a photocurrent SNR of 34.6 dB using a tapered output waveguide with 10 mW input source power corresponds to approximately 6 quantization bits.

Computational Density and Energy Efficiency

It is worth noting that once the DGNN architecture design is optimized and physically fabricated, the on-chip optics for the computational nodes and graph representation, as well as the optical output classifier during inference, are passive. The inference process for this graph-based AI task is processed at the speed of light, is limited only by input data modulation and output detection rate, and consumes very little energy compared to electronic GNNs.

Specifically, assume that DGNN uses MSG(·) to convert the n-dimensional attributes of each node into m-dimensional optical neural information, uses AGG(·) to aggregate the optical features of k nodes, and stacks P for class C classification tasks. head. Therefore, the MSG(·) module of each node contains the n×m weight matrix of each node, the AGG(·) module of each head contains the sum of k nodes of the m-dimensional vector, and the classifier contains mP×C weights. matrix.

Therefore, each inference cycle of DGNN contains (2nmk + mk)P operations (OP) for feature extraction and 2mPC operations for classification, i.e., there are total operations of (2nk + k + 2C)mP.

The research, titled

Illustration: DGNN graph classification on action recognition task. (Source: paper)

Taking into account 30GHz data modulation and photodetection rates based on existing silicon photonics foundries, the computational speed of DGNN is (6nk + 3k + 6C)mP × 10^10 OPs/s. Assuming a typical light source power of 10 mW, the energy efficiency of DGNN is (6nk + 3k + 6C)mP × 10^12 OPs/J.

For the node classification setting of n = 20, m = 2, k = 8, P = 4, C = 8, the calculation speed is 82.6 TOP/s (Tera-Operations/s) and the energy efficiency is 8.26 POP/s (Peta- Operations/s) per watt. For the DPU module with a computational area size of 61.5 μm x 45 μm in the figure below, using a 3×2 weight matrix to execute the MSG(·) function, the computational density is 130 TOP/s per square millimeter .

Assuming the size of each MZI is 100 μm x 100 μm, a corresponding implementation of the same 3×2 weight matrix using an on-chip MZI photonic device would require a computational area size of 300 μm x 200 μm, which is approximately 21.7 times larger.

Note that the state-of-the-art GPU Tesla V100 has an energy efficiency and compute density of 100 GOP/s per watt and 37 GOP/s per square millimeter (Giga-Operations/s) respectively. The DGNN architecture has achieved more than four orders of magnitude improvement in energy efficiency and more than three orders of magnitude improvement in computing speed.

The research, titled

Illustration: Semi-supervised node classification on synthetic graphs. (Source: paper)

Scalability of the architecture

The proposed DGNN architecture only performs AGG(·) once to directly consider high-order node features, avoiding the exponential neighborhood expansion problem when extracting remote neighborhood information, and facilitates scalability for learning larger graphs. In principle, the number of heads of the architecture can be scaled to any size, and basic DPU modules (such as those in Figure 1C) can be stacked horizontally and interconnected with Y couplers and strip waveguides to aggregate optical neural messages from arbitrarily sized neighborhoods .

Additionally, the architecture has the flexibility to scale multiple rounds of optical messaging by further stacking DPU modules.DPU modules can be scaled up by increasing the number of metal layers and meta-atoms per layer, and the number of inputs and outputs to the DPU can be scaled up by additional light modulators and waveguide crossovers.

The research, titled

Illustration: Zooming in on the neural message dimensions of a DGNN node. (Source: paper)

The operating wavelength of this architecture can be extended from a single wavelength to multiple wavelengths to further improve computing throughput. The accumulation of systematic errors in can be mitigated by retraining the output classifier.

The research, titled

Illustration: Training DGNN-E using binary modulation. (Source: paper)

In addition, in-situ training methods can also address system errors and improve training efficiency by developing on-chip DPU modules with programmable modulation coefficients (e.g., using one-dimensional indium tin oxide for modulation).

Limitations and future work

In this study, the optical feature aggregation in DGNN is implemented using a 2×1 optical Y coupler with a combination ratio of 50:50, which does not support assigning different weights to different adjacent nodes. , that is, the weighted sum. Although average feature aggregation has achieved remarkable performance in node and graph-level classification tasks, message passing using weighted sums can further improve model capacity and can be implemented using on-chip amplitude modulators such as phase change materials.

Another limitation is that the proposed DGNN architecture uses a linear model for optical message passing. Although existing work has demonstrated the possibility of implementing optical nonlinear activation functions, nonlinear operations are not important in GNNs, as studied in previous work. This is demonstrated by the superior model performance achieved by DGNN on real-world benchmark datasets.

For example, DGNN achieves nearly state-of-the-art performance on Amazon Photo under large scarce training labels, and significantly outperforms e-GNN under the scarce label setting. Therefore, the inclusion of nonlinear activation functions in DGNN is left for future work as a potential to further enhance the model's learning capabilities.

Taken together, the researchers hope that this work will inspire future development of advanced optical deep learning architectures with integrated photonic circuits beyond the Euclidean domain for efficient learning of graph representations.

Paper link: https://www.science.org/doi/10.1126/sciadv.abn7630

science Category Latest News