The research, titled "All-optical graph representation learning using integrated diffractive photonic computing units", was published in "Science Advances" on June 15, 2022.

2024/04/2816:32:33 science 1390

Editor | Radish Skin

Photonic neural networks perform brain-inspired calculations using photons instead of electrons to achieve significantly improved computing performance. However, the existing architecture can only handle data with regular structure and cannot be generalized to graph-structured data beyond Euclidean space .

Researchers at Tsinghua University proposed Diffraction Graph Neural Network (DGNN), an all-optical graph representation learning architecture based on a Diffraction Photon Computing Unit (DPU) and on-chip optics, to address this limitation. Specifically, graph node properties are encoded into strip optical waveguides, converted by DPU, and aggregated by optical couplers to extract their feature representations.

DGNN captures complex dependencies between node neighborhoods during the passage of light-speed optical messages through graph structures. They demonstrated the application of DGNN to node and graph-level classification tasks via a benchmark database and achieved superior performance. The research team's work opens a new direction for using deep learning to design specialized integrated photonic circuits for efficient processing of large-scale graph data structures.

The research is titled "All-optical graph representation learning using integrated diffractive photonic comPuting units" and was published in "Science Advances" on June 15, 2022.

Deep learning technology has made tremendous progress in a wide range of artificial intelligence (AI) applications, including computer vision , speech recognition, natural language processing, autonomous vehicles, biomedical sciences, and more. Its core is the use of multi-layer neural networks to learn hierarchical and complex abstractions from big data, which is achieved by integrated electronic computing platforms such as central processing units , graphics processing units (GPU), tensor processing units and field programmable gates. driven by the continuous development of arrays).

However, electronic computing performance is approaching the physical limit and it is difficult to keep up with the growth in the development needs of artificial intelligence. This is a common dilemma in a wide range of applications that require large-scale deep neural models.

In recent years, the research on photon computing has attracted more and more attention. photon is used as the computing medium, and its advanced characteristics of high parallelism, lowest power consumption and light speed signal processing are used to build photonic neural networks. Many photonic neural network architectures have been proposed to facilitate complex neurally inspired computations, such as diffractive neural networks, optical interference neural networks, photon spiking neural networks, and photonic storage computations.

Existing architectures are most successful in processing regularly structured data in the form of vector or grid-like images. However, various scientific fields analyze data beyond this basic Euclidean realm. As a typical representative, graph-structured data encoding rich relationships (i.e., edges) between entities (i.e., nodes) in complex systems are ubiquitous in the real world, ranging from chemical molecules to brain networks.

To process graph-structured data, graph neural networks (GNN) have evolved into a broad class of new methods capable of integrating local node features and graph topology for representation learning.

Among these models, message-passing-based GNN has the main advantages of flexibility and efficiency by generating neural messages at graph nodes and passing edges to their neighbors for feature updates. It has been successfully used in many graph-based applications, including molecular property prediction, drug discovery, skeleton-based human action recognition, spatiotemporal prediction, etc. However, how to effectively exploit photonic computing to benefit graph-based deep learning remains largely unexplored.

Here, researchers propose diffractive GNN (DGNN), a novel photonic GNN architecture that can perform optical message passing on graph-structured data. DGNN is built on an integrated Diffractive Photon Computing Unit (DPU) for generating optical node features. Each DPU includes a continuous diffraction layer implemented with metal wires to convert node properties into optical neural information, where strip optical waveguides are deployed to encode input node properties and output the conversion results.Optical neural information sent from node neighborhoods is aggregated using optical couplers.

In the DGNN architecture, DPUs can be cascaded horizontally to expand the receptive field, thereby capturing complex dependencies from adjacent nodes of any size. Additionally, DPUs can be stacked vertically to extract higher-dimensional optical node features to improve their learning capabilities, which is inspired by the multi-head strategy used in many modern deep learning models such as Transformers and graph attention networks.

Building on this scalable optical messaging scheme, the researchers first demonstrated a semi-supervised node classification task, in which optical node features extracted by DGNN were fed into an optical or electronic output classifier to determine the node class. Results show that the optical DGNN achieves competitive or even superior classification performance on synthetic graph models and electronic GNNs on three real-world graph benchmark datasets.

Additionally, DGNN also supports graph-level classification, where an additional DPU is used to aggregate all-optical node features into a graph-level representation for classification. Results on skeleton-based human action recognition demonstrate the effectiveness of this architecture for graph classification tasks.

Illustration: The architecture of optical DGNN. (Source: paper)

Scarce training labels

The researchers analyzed the effectiveness of DGNN when the training label size is limited, a common situation in semi-supervised learning. Under the same architectural settings, they compared the performance of DGNN against electronic model baselines under different sizes of training labels, including 1, 5, 10, 15, 20, and 25 labels per class.

Plotted test accuracy bar chart with error bars by evaluating the training labels 10 times for each size. Binarizing the diffractive modulation layer helps overcome local minima problems during network training and improves classification accuracy. The DGNN architecture outperforms all baselines in all label-scarcity settings, especially at smaller training set sizes, such as only one label per class, indicating higher generalization capabilities relative to other in silico methods.

Illustration: Classification on Amazon Photo with scarce training labels. (Source: Paper)

DPU with tapered output waveguide

The tapered waveguide is used to couple the output light field over a larger area to the output port of the integrated DPU. The improved coupling efficiency of the tapered output waveguide enables the photodetector to receive more optical power and improve the signal-to-noise ratio (SNR) during photoelectric conversion. A higher signal-to-noise ratio provides a higher-quality input signal to the classifier and ensures the stability of the classification task.

Quantitative evaluation of output energy distribution and model performance of tapered and single-mode waveguides. Researchers used FDTD to evaluate the power distribution of optical features on synthetic SBM plot test nodes and the trained DGNN-E model. The starting core width of the tapered output waveguide was optimized and set to 2 μm instead of 500 nm used in single-mode output waveguides. For each test graph node, the power transmission rate of the DPU is obtained by calculating the ratio of the output power of the two ports relative to the input light source power, thereby obtaining a frequency histogram of the transmission rate on all graph nodes. The average power transfer rate of the DPU with tapered output waveguide is 2.01%, which is about 5.6 times higher than the 0.36% of the single-mode output waveguide.

Using the estimated power transfer rate of the DPU, the researchers evaluated the photocurrent SNR of the on-chip photodetector, using the formula detailed in Materials and Methods, at different input light source powers. Meanwhile, the test accuracy of the DGNN-E model in terms of SNR was further evaluated, with the top-k neighboring nodes set to 16, by including photodetector noise into the node features and retraining the e-classifier (Figure S6D). Increasing the input light source power and the power transfer rate of the DPU improves the photocurrent SNR and achieves more stable model performance on the synthetic SBM plots.

Illustration: Semi-supervised node classification on three benchmark graph databases.(Source: paper)

In this work, the PPRGo model with single-round message passing is employed to directly capture high-order neighborhood information. The calculated energy efficiency of DGNN is calculated based on the input light source power of 10 mW. In the tapered and single-mode output waveguides, the photocurrent signal-to-noise ratio of DGNN reaches 34.6 and 20.2 dB respectively. The corresponding model test accuracy is 94.4% and 92.3%.

DPU calculation accuracy

quantization bits determine the DPU calculation accuracy, which can be inferred from the photocurrent SNR. In digital signal processing, quantization errors are introduced during the quantization process of the analog-to-digital converter. Assuming that the signal has a uniform distribution covering all quantization levels, the signal quantization-to-noise ratio can be expressed as SQNR = 20log10(2Q) = 6.02Q dB, where Q represents the number of quantization bits. Therefore, a photocurrent SNR of 34.6 dB using a tapered output waveguide with 10 mW input source power corresponds to approximately 6 quantization bits.

Computational Density and Energy Efficiency

It is worth noting that once the DGNN architecture design is optimized and physically fabricated, the on-chip optics for the computational nodes and graph representation, as well as the optical output classifier during inference, are passive. The inference process for this graph-based AI task is processed at the speed of light, is limited only by input data modulation and output detection rate, and consumes very little energy compared to electronic GNNs.

Specifically, assume that DGNN uses MSG(·) to convert the n-dimensional attributes of each node into m-dimensional optical neural information, uses AGG(·) to aggregate the optical features of k nodes, and stacks P for class C classification tasks. head. Therefore, the MSG(·) module of each node contains the n×m weight matrix of each node, the AGG(·) module of each head contains the sum of k nodes of the m-dimensional vector, and the classifier contains mP×C weights. matrix.

Therefore, each inference cycle of DGNN contains (2nmk + mk)P operations (OP) for feature extraction and 2mPC operations for classification, i.e., there are total operations of (2nk + k + 2C)mP.

Illustration: DGNN graph classification on action recognition task. (Source: paper)

Taking into account 30GHz data modulation and photodetection rates based on existing silicon photonics foundries, the computational speed of DGNN is (6nk + 3k + 6C)mP × 10^10 OPs/s. Assuming a typical light source power of 10 mW, the energy efficiency of DGNN is (6nk + 3k + 6C)mP × 10^12 OPs/J.

For the node classification setting of n = 20, m = 2, k = 8, P = 4, C = 8, the calculation speed is 82.6 TOP/s (Tera-Operations/s) and the energy efficiency is 8.26 POP/s (Peta- Operations/s) per watt. For the DPU module with a computational area size of 61.5 μm x 45 μm in the figure below, using a 3×2 weight matrix to execute the MSG(·) function, the computational density is 130 TOP/s per square millimeter .

Assuming the size of each MZI is 100 μm x 100 μm, a corresponding implementation of the same 3×2 weight matrix using an on-chip MZI photonic device would require a computational area size of 300 μm x 200 μm, which is approximately 21.7 times larger.

Note that the state-of-the-art GPU Tesla V100 has an energy efficiency and compute density of 100 GOP/s per watt and 37 GOP/s per square millimeter (Giga-Operations/s) respectively. The DGNN architecture has achieved more than four orders of magnitude improvement in energy efficiency and more than three orders of magnitude improvement in computing speed.

Illustration: Semi-supervised node classification on synthetic graphs. (Source: paper)

Scalability of the architecture

The proposed DGNN architecture only performs AGG(·) once to directly consider high-order node features, avoiding the exponential neighborhood expansion problem when extracting remote neighborhood information, and facilitates scalability for learning larger graphs. In principle, the number of heads of the architecture can be scaled to any size, and basic DPU modules (such as those in Figure 1C) can be stacked horizontally and interconnected with Y couplers and strip waveguides to aggregate optical neural messages from arbitrarily sized neighborhoods .

Additionally, the architecture has the flexibility to scale multiple rounds of optical messaging by further stacking DPU modules.DPU modules can be scaled up by increasing the number of metal layers and meta-atoms per layer, and the number of inputs and outputs to the DPU can be scaled up by additional light modulators and waveguide crossovers.

Illustration: Zooming in on the neural message dimensions of a DGNN node. (Source: paper)

The operating wavelength of this architecture can be extended from a single wavelength to multiple wavelengths to further improve computing throughput. The accumulation of systematic errors in can be mitigated by retraining the output classifier.

Illustration: Training DGNN-E using binary modulation. (Source: paper)

In addition, in-situ training methods can also address system errors and improve training efficiency by developing on-chip DPU modules with programmable modulation coefficients (e.g., using one-dimensional indium tin oxide for modulation).

Limitations and future work

In this study, the optical feature aggregation in DGNN is implemented using a 2×1 optical Y coupler with a combination ratio of 50:50, which does not support assigning different weights to different adjacent nodes. , that is, the weighted sum. Although average feature aggregation has achieved remarkable performance in node and graph-level classification tasks, message passing using weighted sums can further improve model capacity and can be implemented using on-chip amplitude modulators such as phase change materials.

Another limitation is that the proposed DGNN architecture uses a linear model for optical message passing. Although existing work has demonstrated the possibility of implementing optical nonlinear activation functions, nonlinear operations are not important in GNNs, as studied in previous work. This is demonstrated by the superior model performance achieved by DGNN on real-world benchmark datasets.

For example, DGNN achieves nearly state-of-the-art performance on Amazon Photo under large scarce training labels, and significantly outperforms e-GNN under the scarce label setting. Therefore, the inclusion of nonlinear activation functions in DGNN is left for future work as a potential to further enhance the model's learning capabilities.

Taken together, the researchers hope that this work will inspire future development of advanced optical deep learning architectures with integrated photonic circuits beyond the Euclidean domain for efficient learning of graph representations.

Paper link: https://www.science.org/doi/10.1126/sciadv.abn7630

science

I think you shouldn't know you. The 60-year-old man in our village told me that magpies are "naughty eggs" among birds. Why do you say that? But, such a good bird, why do you say that? - DayDayNews

I think you shouldn't know you. The 60-year-old man in our village told me that magpies are "naughty eggs" among birds. Why do you say that? But, such a good bird, why do you say that?

Why do eagles and other beasts do not eat magpies? Isn't it delicious?

07/01 1674

There are countless universes, each slightly different from the previous one. In other cases, the sun is a different color. But in all these universes, there is a constant: you. - DayDayNews

There are countless universes, each slightly different from the previous one. In other cases, the sun is a different color. But in all these universes, there is a constant: you.

You in the multiverse

07/01 1503

Scientists say the collision between the Milky Way and Andromeda has begun, why don’t we feel it? For the past century, astronomers have been plagued by the contradictions of the universe, which, even at the most basic level, are in conflict with the contradiction between general - DayDayNews

Scientists say the collision between the Milky Way and Andromeda has begun, why don’t we feel it? For the past century, astronomers have been plagued by the contradictions of the universe, which, even at the most basic level, are in conflict with the contradiction between general

Scientists say the collision between the Milky Way and Andromeda has begun, why don’t we feel it?

07/01 1200

Because the probes we developed are currently one of the few in the world with a sensitivity of more than ten times in cells, the first reviewer believes that this work has made a huge leap forward. - DayDayNews

Because the probes we developed are currently one of the few in the world with a sensitivity of more than ten times in cells, the first reviewer believes that this work has made a huge leap forward.

Scientists develop new fluorescent probes to provide high-performance tools to solve basic biomedical problems

07/01 1866

Among them, there is a poisonous snake that is always remembered by foreigners. People affectionately call it "Little Green Dragon", which can be sold at a sky-high price of 1 million yuan! The Little Green Dragon is also called Mangshan soldering iron head. Although the name con - DayDayNews

Among them, there is a poisonous snake that is always remembered by foreigners. People affectionately call it "Little Green Dragon", which can be sold at a sky-high price of 1 million yuan! The Little Green Dragon is also called Mangshan soldering iron head. Although the name con

Why is the little green dragon, the poisonous snake in our country, always "thinking" abroad? A worth of 1 million?

07/01 1550

In laboratory tests, the new compound destroyed 10 antibiotic-resistant MRSA strains. Researchers at the University of Bath in the UK have discovered a compound in laboratory experiments that can both inhibit MRSA superbacteria and make it more susceptible to antibiotics. - DayDayNews

In laboratory tests, the new compound destroyed 10 antibiotic-resistant MRSA strains. Researchers at the University of Bath in the UK have discovered a compound in laboratory experiments that can both inhibit MRSA superbacteria and make it more susceptible to antibiotics.

Laboratory test: New compounds destroy 10 antibiotic-resistant strains

07/01 1247

Compiled by: Mintina, the tiny bees playing football games raise some questions about the inner life of invertebrates. The researchers observed the bumblebee rolling wooden balls for no other reason, but just to make fun of the photo provided: Richard Rickitt researchers say the - DayDayNews

Compiled by: Mintina, the tiny bees playing football games raise some questions about the inner life of invertebrates. The researchers observed the bumblebee rolling wooden balls for no other reason, but just to make fun of the photo provided: Richard Rickitt researchers say the

"World" Bumblebee rolling balls around just want to have fun

07/01 1021

Judging from the current analysis of marine and atmospheric conditions, as the widespread colder seawater in the tropical Pacific region is maintained, and the colder seawater will continue to exist in the future, the La Nina phenomenon is still present in the Pacific. - DayDayNews

Judging from the current analysis of marine and atmospheric conditions, as the widespread colder seawater in the tropical Pacific region is maintained, and the colder seawater will continue to exist in the future, the La Nina phenomenon is still present in the Pacific.

The three peaks of La Nina are highly certain, are there signs of cold winter? Authoritative forecast: Beware of extreme low temperature weather

07/01 1556

Diatoms are a single-celled plant with pigment bodies, often connected by several or many individual cells into various groups. After splitting, a new lower shell is produced in the original shell. The box surface and the box bottom are named upper and lower shell surfaces respec - DayDayNews

Diatoms are a single-celled plant with pigment bodies, often connected by several or many individual cells into various groups. After splitting, a new lower shell is produced in the original shell. The box surface and the box bottom are named upper and lower shell surfaces respec

Amazing! Diatoms under the microscope are similar to the ancient Roman Colosseum

07/01 1510

Written by | Yijing Medical is a therapy targeting specific genetic mutation sites in cancer. Usually, the patient's tumor is sequenced, and targeted treatment is carried out after identification of driver mutations. For example, imatinib (gleevec) is used in leukemia patients wi - DayDayNews

Written by | Yijing Medical is a therapy targeting specific genetic mutation sites in cancer. Usually, the patient's tumor is sequenced, and targeted treatment is carried out after identification of driver mutations. For example, imatinib (gleevec) is used in leukemia patients wi

Transformation Model: How 3D patient tumor alternatives achieve precision oncology

07/01 1570