ML | GNN Graph Neural Network Model
Overview of Graph Neural Networks (GNNs)
Graph Neural Networks (GNNs) are a specialized type of neural network designed to process and analyze graph-structured data. Unlike traditional neural networks that operate on structured data (like images or text), GNNs can directly handle the complex relationships represented in graphs, making them particularly useful for tasks where data is interconnected.
What is a Graph?
A graph is a mathematical structure consisting of nodes (or vertices) and edges (connections between nodes). Formally, a graph $ G $ can be defined as $ G = (V, E) $, where $ V $ is the set of nodes and $ E $ is the set of edges. Graphs can represent various types of data, such as social networks, molecular structures, and transportation systems.
How GNNs Work
GNNs utilize a process called message passing, where information is exchanged between neighboring nodes. Each node updates its state by aggregating information from its neighbors, allowing the network to learn representations that capture the structure and features of the graph. The final output for each node is known as its embedding, which encapsulates the node's information in relation to its neighbors. (Message passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph.)
Types of GNNs
There are several variations of GNNs, each suited for different applications:
- Graph Convolutional Networks (GCNs): These networks extend convolutional neural networks (CNNs) to graph data by learning features from neighboring nodes through convolution-like operations.
- Recurrent Graph Neural Networks (RGNNs): RGNNs are designed for handling temporal or sequential data on graphs, leveraging recurrent mechanisms to model dependencies over time.
- Gated Graph Neural Networks (GGNNs): These networks introduce gating mechanisms to better manage long-range dependencies within graph structures.
- Graph Autoencoders: Used primarily for unsupervised learning tasks such as link prediction, these models encode graph data into lower-dimensional representations and then attempt to reconstruct the original graph.
Applications of GNNs
GNNs are versatile and can be applied in various domains:
- Node Classification: Predicting labels for nodes in a graph, often used in social networks or citation networks.
- Link Prediction: Estimating the likelihood of connections between nodes, useful in recommendation systems.
- Graph Classification: Classifying entire graphs based on their structure and node features, applicable in molecular chemistry and bioinformatics.

Conclusion
Graph Neural Networks represent a significant advancement in machine learning by enabling the analysis of complex relational data. Their ability to learn from the structure and features of graphs opens up new possibilities across numerous fields, from social sciences to biology and beyond. As research continues to evolve, GNNs are likely to become increasingly integral to understanding and leveraging interconnected data.
A Gentle Introduction to Graph Neural Networks (Recommend)
A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances).
We’re going to build GNNs using the “message passing neural network” framework proposed by Gilmer et al. using the Graph Nets architecture schematics introduced by Battaglia et al.
GNNs adopt a “graph-in, graph-out” architecture meaning that these model types accept a graph as input, with information loaded into its nodes, edges and global-context, and progressively transform these embeddings, without changing the connectivity of the input graph.
- What types of problems have graph structured data?
- We have described some examples of graphs in the wild, but what tasks do we want to perform on this data? There are three general types of prediction tasks on graphs: graph-level, node-level, and edge-level.
- In a
graph-level task, we predict a single property for a whole graph. - For a
node-level task, we predict some property for each node in a graph. - For an
edge-level task, we want to predict the property or presence of edges in a graph.
- In a
- For the three levels of prediction problems described above (graph-level, node-level, and edge-level), we will show that all of the following problems can be solved with a single model class, the GNN. But first, let’s take a tour through the three classes of graph prediction problems in more detail, and provide concrete examples of each.
- We have described some examples of graphs in the wild, but what tasks do we want to perform on this data? There are three general types of prediction tasks on graphs: graph-level, node-level, and edge-level.

However, it is not always so simple. For instance, you might have information in the graph stored in edges, but no information in nodes, but still need to make predictions on nodes. We need a way to collect information from edges and give them to nodes for prediction. We can do this by pooling. Pooling proceeds in two steps:
- For each item to be pooled, gather each of their embeddings and concatenate them into a matrix.
- The gathered embeddings are then aggregated, usually via a sum operation.
We represent the pooling operation by the letter \(\rho\), and denote that we are gathering information from edges to nodes as \(p_{E_n \to V_{n}}\).


Other types of graphs (multigraphs, hypergraphs, hypernodes, hierarchical graphs)
We can also consider nested graphs, where for example a node represents a graph, also called a hypernode graph. Nested graphs are useful for representing hierarchical information. For example, we can consider a network of molecules, where a node represents a molecule and an edge is shared between two molecules if we have a way (reaction) of transforming one to the other (1), (2). In this case, we can learn on a nested graph by having a GNN that learns representations at the molecule level and another at the reaction network level, and alternate between them during training.

GNN introduction
- A graph G can be defined as G = (V, E), where V is the set of nodes, and E are the edges between them.
- If there are directional dependencies between nodes then edges are directed. If not, edges are undirected.
- A graph is often represented by A, an adjacency matrix.
- If a graph has n nodes, A has a dimension of (n × n).
- Sometimes the nodes have a set of features (for example, a user profile). If the node has f numbers of features, then the node feature matrix X has a dimension of (n × f).

- In graph theory, we implement the concept of Node Embedding. It means mapping nodes to a d- dimensional embedding space (low dimensional space rather than the actual dimension of the graph), so that similar nodes in the graph are embedded close to each other. Our goal is to map nodes so that similarity in the embedding space approximates similarity in the network.
- Let’s define \(u\) and \(v\) as two nodes in a graph.
- \(x_u\) and \(x_v\) are two feature vectors.
- Now we’ll define the encoder function Enc(u) and Enc(v), which convert the feature vectors to \(z_u\) and \(z_v\).
- Note: the similarity function could be Euclidean distance.
- So the challenge now is how to come up with the encoder function?
- The encoder function should be able to perform :
- Locality (local network neighborhoods)
- Aggregate information
- Stacking multiple layers (computation)
- The encoder function should be able to perform :





where \(h_v^k\) is Hidden state of node \(v\) on step \(k\).
Now, to train the model we need to define a loss function on the embeddings.
- We can feed the embeddings into any loss function and run stochastic gradient descent to train the weight parameters.
- Training can be unsupervised or supervised:
- Unsupervised training:
- Use only the graph structure: similar nodes have similar embeddings. Unsupervised loss function can be a loss based on node proximity in the graph, or random walks. (Node clustering is a typical unsupervised learning task.)
- Supervised training:
- Train model for a supervised task like node classification, normal or anomalous node.
- Unsupervised training:
- Training can be unsupervised or supervised:
和周圍位置的資訊一起收集,進到運算,然後傳到下一層,重複這個步驟…那不就是卷積神經網路(CNN)嗎?沒錯,在CNN裡,相鄰位置集結資訊的是圖片的畫素,而在這裡,GNN的訊息傳遞收集的是相鄰位置的端點/連結。差別在於:CNN有固定的規格,而訊息傳遞端看關聯性決定,比較有彈性。
我們再以端點的訊息傳遞為例。第一層的GNN,收集的是某個端點周圍,有直接關連性的端點的資訊。再堆疊一層之後,雖然重複這個步驟,但別忘了,它周圍那些端點,在上一層也跟他做過一樣的動作 — 蒐集周圍直接關聯性的資訊。所以在第二層,這個端點可以藉由只集結周圍有直接關聯性的端點的資訊,就收集到間接關聯性端點的資訊了。如此層層疊加,疊到一定層數的時候,一個端點可能可以囊括整個圖像一大部分,甚至是整體端點的資訊,只是所含資訊量的比重,隨端點關聯性遠近有所不同而已。
Node embeddings
Graph Representation Learning


why embedding?
- Task: Map nodes into an embedding space
- Similarity of embeddings between nodes indicates their similarity in the network. For example:
- Both nodes are close to each other (connected by an edge)
- Encode network information
- Potentially used for many downstream predictions
- Encoder + Decoder Framework
- We will now learn node similarity definition that uses random walks, and how to optimize embeddings for such a similarity measure.
- This is unsupervised/self-supervised way of learning node embeddings.
- We are not utilizing node labels
- We are not utilizing node features
- The goal is to directly estimate a set of coordinates(i.e., the embedding) of a node so that some aspect of the network structure (captured by DEC) is preserved.
- These embeddings are task independent
- They are not trained for a specific task but can be used for any task.
- This is unsupervised/self-supervised way of learning node embeddings.

A = Z^T Z ?


singular value decomposition (SVD)
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \(m\times n\) matrix. It is related to the polar decomposition.

Eigendecomposition of a matrix
like eigen-vector/eigen-values, take first d-dim values?

Cholesky decomposition (What we need!!)
The Cholesky decomposition of a Hermitian positive-definite matrix A,
\[ A = L L^* \]

But,
There are various methods for calculating the Cholesky decomposition. The computational complexity of commonly used algorithms is \(O(n^3)\) in general.
Graph is permutation invariant




Graph neural networks consist of multiple permutation equivariant / invariant functions
- This explains why the naïve MLP approach fails for graphs
Next: Design graph neural networks that are permutation invariant / equivariant by passing and aggregating information from neighbors!
Graph Convolutional Networks & massage passing


- The rows of input node features and output embeddings are aligned
和周圍位置的資訊一起收集,進到運算,然後傳到下一層,重複這個步驟…那不就是卷積神經網路(CNN)嗎?沒錯,在CNN裡,相鄰位置集結資訊的是圖片的畫素,而在這裡,GNN的訊息傳遞收集的是相鄰位置的端點/連結。差別在於:CNN有固定的規格,而訊息傳遞端看關聯性決定,比較有彈性。
我們再以端點的訊息傳遞為例。第一層的GNN,收集的是某個端點周圍,有直接關連性的端點的資訊。再堆疊一層之後,雖然重複這個步驟,但別忘了,它周圍那些端點,在上一層也跟他做過一樣的動作 — 蒐集周圍直接關聯性的資訊。所以在第二層,這個端點可以藉由只集結周圍有直接關聯性的端點的資訊,就收集到間接關聯性端點的資訊了。如此層層疊加,疊到一定層數的時候,一個端點可能可以囊括整個圖像一大部分,甚至是整體端點的資訊,只是所含資訊量的比重,隨端點關聯性遠近有所不同而已。
Training model: loss function?
- How do we train the GCN to generate embeddings? Need to define a loss function on the embeddings.







Capability



Toy model: Interactive Visualiztion of GNN
GNN 101: Visual Learning of Graph Neural Networks in Your Web Browser


GNN vs CNN

- CNN can be seen as a special GNN with fixed neighbor size and ordering:
- The size of the filter is pre-defined for a CNN.
- The advantage of GNN is it processes arbitrary graphs with different degrees for each node.
- CNN is not permutation invariant/equivariant.
- Switching the order of pixels leads to different outputs.
和周圍位置的資訊一起收集,進到運算,然後傳到下一層,重複這個步驟…那不就是卷積神經網路(CNN)嗎?沒錯,在CNN裡,相鄰位置集結資訊的是圖片的畫素,而在這裡,GNN的訊息傳遞收集的是相鄰位置的端點/連結。差別在於:CNN有固定的規格,而訊息傳遞端看關聯性決定,比較有彈性。
我們再以端點的訊息傳遞為例。第一層的GNN,收集的是某個端點周圍,有直接關連性的端點的資訊。再堆疊一層之後,雖然重複這個步驟,但別忘了,它周圍那些端點,在上一層也跟他做過一樣的動作 — 蒐集周圍直接關聯性的資訊。所以在第二層,這個端點可以藉由只集結周圍有直接關聯性的端點的資訊,就收集到間接關聯性端點的資訊了。如此層層疊加,疊到一定層數的時候,一個端點可能可以囊括整個圖像一大部分,甚至是整體端點的資訊,只是所含資訊量的比重,隨端點關聯性遠近有所不同而已。
- GNN is a general architecture
- CNN can be viewed as a special GNN
GNN vs Transformer



- A nice blog plot for this: https://towardsdatascience.com/transformers-are-graph-neural-networks-bca9f75412aa
GNN Framework
GNN layer





Different GNN layer





Some Practices on GNN layer





Class of GNN

Laplacian matrix
In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. Named after Pierre-Simon Laplace, the graph Laplacian matrix can be viewed as a matrix form of the negative discrete Laplace operator on a graph approximating the negative continuous Laplacian obtained by the finite difference method.

Laplacian matrix normalization

Inductive learning, Transductive learning
Inductive Inference 我們利用training data這五個點,得到一個function,再利用這個function去predict test data的label。
Transductive Inference 在使用transductive inference的時候,除了training data以外,我們還可以得知test data的一些資訊。如在上圖中,我們可以得知test data的cluster情況,而藉此我們更可以正確判斷node是屬於哪一個category。
Inductive Inference雖然不能利用到test data的資訊,但如果我們有夠多的training data,而且test data 和training data 是來自於同一個distribution,則Inductive Inference還是會有好的效果。而Inductive Inference最大的優點在於,當有新的test data時,可以套用從training data學到的同一個function來predict。相反的,Transductive Inference可以利用test data的資訊,這對於手邊沒有很多training data的情況很有幫助,但缺點就是當有新的test data進來時,我們需要重新訓練function,因為新的data會提供新的訊息。
Further
Computational graphs
Expressivity Power of GNN
A classical expressivity test is Graph isomorphism
Graph Transformers
Heterogeneous Graphs
Relational GCN
Knowledge Graphs
GNNs for recommender systems
Regional data-driven weather modeling with a global stretched-grid

References
- A Gentle Introduction to Graph Neural Networks (Recommend)
- CS224W: Machine Learning with Graphs (Recommend)
- 拉普拉斯矩阵, 谱分解, 图上的傅里叶变换 (Recommend)
- 图神经网络从入门到入门
- 时空图神经网络综述
- Graphs Networks 分類
- A Comprehensive Introduction to Graph Neural Networks (GNNs)
- Introduction to Graph Neural Network with Pytorch
- Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE transactions on neural networks, 20(1), 61-80.
- Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2018). How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826.
- Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., ... & Sun, M. (2020). Graph neural networks: A review of methods and applications. AI open, 1, 57-81.
- Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y. (2020). A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1), 4-24.
- Neural Message Passing for Quantum Chemistry J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, G.E. Dahl. Proceedings of the 34th International Conference on Machine Learning, Vol 70, pp. 1263--1272. PMLR. 2017.
- Relational inductive biases, deep learning, and graph networks P.W. Battaglia, J.B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, R. Pascanu. 2018.
- Poulovassilis, A., & Levene, M. (1994). A nested-graph model for the representation and manipulation of complex objects. ACM Transactions on Information Systems (TOIS), 12(1), 35-68.
- Zitnik, M., Agrawal, M., & Leskovec, J. (2018). Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, 34(13), i457-i466.
- Stocker, S., Csanyi, G., Reuter, K., & Margraf, J. T. (2020). Machine learning in chemical reaction space. Nature communications, 11(1), 5505.