IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, edit social preview.

dynamic graph representation learning

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row.

TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK REMOVE
  • GENERAL CLASSIFICATION
  • GRAPH EMBEDDING
  • GRAPH REPRESENTATION LEARNING
  • LINK PREDICTION
  • NODE CLASSIFICATION
  • REPRESENTATION LEARNING

Remove a task

dynamic graph representation learning

Add a method

Remove a method, edit datasets, dynamic graph representation learning via self-attention networks.

22 Dec 2018  ·  Aravind Sankar , Yanhong Wu , Liang Gou , Wei zhang , Hao Yang · Edit social preview

Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit, results from the paper edit, methods edit add remove.

Dyrep: Learning Representations over Dynamic Graphs.

Research areas.

Machine Intelligence

Learn more about how we conduct our research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work.

Philosophy-light-banner

TemporalGAT: Attention-Based Dynamic Graph Representation Learning

  • Conference paper
  • First Online: 06 May 2020
  • Cite this conference paper

dynamic graph representation learning

  • Ahmed Fathy 14 &
  • Kan Li 14  

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12084))

Included in the following conference series:

  • Pacific-Asia Conference on Knowledge Discovery and Data Mining

8582 Accesses

17 Citations

Learning representations for dynamic graphs is fundamental as it supports numerous graph analytic tasks such as dynamic link prediction, node classification, and visualization. Real-world dynamic graphs are continuously evolved where new nodes and edges are introduced or removed during graph evolution. Most existing dynamic graph representation learning methods focus on modeling dynamic graphs with fixed nodes due to the complexity of modeling dynamic graphs, and therefore, cannot efficiently learn the evolutionary patterns of real-world evolving graphs. Moreover, existing methods generally model the structural information of evolving graphs separately from temporal information. This leads to the loss of important structural and temporal information that could cause the degradation of predictive performance of the model. By employing an innovative neural network architecture based on graph attention networks and temporal convolutions, our framework jointly learns graph representations contemplating evolving graph structure and temporal patterns. We propose a deep attention model to learn low-dimensional feature representations which preserves the graph structure and features among series of graph snapshots over time. Experimental results on multiple real-world dynamic graph datasets show that, our proposed method is competitive against various state-of-the-art methods.

You have full access to this open access chapter,  Download conference paper PDF

Similar content being viewed by others

dynamic graph representation learning

Dynamic Network Embedding in Hyperbolic Space via Self-attention

dynamic graph representation learning

DynGCN: A Dynamic Graph Convolutional Network Based on Spatial-Temporal Modeling

dynamic graph representation learning

TE-DyGE: Temporal Evolution-Enhanced Dynamic Graph Embedding Network

  • Dynamic graph representation learning
  • Graph attention networks
  • Temporal convolutional networks

1 Introduction

Many appealing real-world applications involve data streams that cannot be well represented in a planar structure, but exist in irregular domain. This case applies to knowledge bases [ 35 ], 3D models [ 18 ], social media [ 22 ], and biological networks [ 7 ] which are usually represented by graphs.

In graph representation learning, the key challenge is to learn a low-dimensional representation of the data that is most informative to preserve the structural information among the nodes in graphs. Through graph embedding, we can represent the nodes in a low-dimensional vector form. This paves the way to apply machine learning in graph analysis and data mining tasks easily and efficiently such as node classification [ 11 , 22 ], link prediction [ 7 ], clustering [ 4 ], and visualization [ 30 ].

Recently, there has been significant interest in graph representation learning mainly focuses on static graphs [ 5 , 7 , 8 , 11 , 22 , 29 ] which attracted the attention of researchers due to its extensive usage in numerous real-world applications. However, a wide range of real-world applications are intrinsically dynamic and the underlying graph structure evolves over time and are usually represented as a sequence of graph snapshots over time [ 14 ].

Learning dynamic graph representations is challenging due to the time-varying nature of graph structures, where the graph nodes and edges are in continues evolution. New nodes and edges can be introduced or removed in each time step. Consequently, this requires the learned representations not only to preserve structural information of the graphs, but also to efficiently capture the temporal variations over time.

Recently, novel methods for learning dynamic graph representations have been proposed in literature. Some recent work attempts to learn dynamic graph representation such as [ 10 , 15 , 36 , 37 ], where they mainly apply a temporally regularized weights to enforce the smoothness of node representations from different adjacent time steps. However, these methods generally fail to learn effective representations when graph nodes exhibit substantially distinct evolutionary behaviors over time [ 24 ].

Trivedi et al. [ 27 ] handle temporal reasoning problem in multi-relational knowledge graphs through employing a recurrent neural network. However, their learned temporal representations are limited to modeling first-order proximity between nodes, while ignoring the higher-order proximities among neighborhoods which are essential for preventing the graph structure as explained in [ 25 , 34 ].

Recently, the authors in [ 24 ] propose dynamic graph embedding approach that leverage self-attention networks to learn node representations. This method focus on learning representations that capture structural properties and temporal evolutionary patterns over time. However, this method cannot effectively capture the structural evolution information over time, since it employs structure attention layers to each time step separately and generate node representations, which is followed by temporal attention layers to capture the variations in generated representations.

Recently, attention mechanisms have achieved great success in NLP and sequential learning tasks [ 1 , 31 ]. Attention mechanisms learn a function that aggregates a variable-sized inputs while focusing on the most relevant sequences of the input to make decisions, which makes them unique. An attention mechanism is commonly referred to as self-attention, when it computes the representation of a single sequence.

Veličković et al. [ 29 ] extend the self-attention mechanism and apply it on static graphs by enabling each node to attend over its neighbors. In this paper, we specifically focus on applying graph attention networks (GATs) [ 29 ] because of its effectiveness in addressing the shortcomings of prior methods based on graph convolutions such as [ 8 , 11 ]. GATs allow for assigning different weights to nodes of the same neighborhood by applying multi-head self-attention layers, which enables a leap in model capacity. Additionally, the self-attention mechanism is applied to all graph edges, and thus, it does not depend on direct access to the graph structure or its nodes, which was a limitation of many prior dynamic graph representation learning techniques.

Inspired by this recent work, we present a temporal self-attention neural network architecture to learn node representations on dynamic graphs. Specifically, we apply self-attention along structural neighborhoods over temporal dynamics through leveraging temporal convolutional network (TCN) [ 2 , 20 ]. We learn dynamic node representation by considering the neighborhood in each time step during graph evolution by applying a self-attention strategy without violating the ordering of the graph snapshots.

Overall our paper makes the following contributions:

We present a novel neural architecture named (TemporalGAT) to learn representations on dynamic graphs through integrating GAT, TCN, and a statistical loss function.

We conduct extensive experiments on real-world dynamic graph datasets and compare with state-of-the-art approaches which validate our method.

2 Problem Formulation

In this work, we aim to solve the problem of dynamic graph representation learning. We represent dynamic graph G as a sequence of graph snapshots, \(G_1,G_2,\ldots ,G_\mathcal {T},\) from timestamps 1 to \(\mathcal {T}\) . A graph at specific time t is represented by \(G_t = (V_t,E_t,F_t)\) where \(V_t\) , \(E_t\) and \(F_t\) represent the nodes, edges and features of the graph respectively. The goal of dynamic graph representation learning is to learn effective latent representations for each node in the graph \(v \in V\) at each time step \(t = 1,2,\ldots , \mathcal {T}\) . The learned node representations should efficiently preserve the graph structure for all node \(v \in V\) at any time step t .

3 TemporalGAT Framework

In this section, we present our proposed TemporalGAT framework, as illustrated in Fig.  1 . We propose a novel model architecture to learn representations for dynamic graphs through utilizing GATs and TCNs networks to promote the model ability in capturing temporal evolutionary patterns in a dynamic graph. We employ multi-head graph attentions and TCNs as a special recurrent structure to improve model efficiency. TCNs has proven to be stable and powerful for modeling long-range dependencies as discussed in previous studies [ 2 , 20 ]. In addition, this architecture can take a sequence of any length and map it to an output sequence of specific length which can be very effective in dynamic graphs due to varying size of adjacency and feature matrices.

figure 1

The framework of TemporalGAT.

The input graph snapshot is applied to GAT layer which has dilated causal convolutions to ensure no information leakage from future to past graph snapshots. Formally, for an input vector \(x\in \mathbb {R}^n\) and a filter \(f: \{0,\ldots ,k-1\}\rightarrow \mathbb {R} \) , the dilated convolution operation \(C_d\) on element u of the vector x is defined as:

where d is the dilation factor, k is the filter size, and \(u-d\cdot i\) makes up for the direction of the past information. When using a large dilation factors, the output at the highest level can represent a wider range of inputs, thus effectively expanding the receptive field [ 32 ] of convolution networks. For instance, through applying dilated convolution operations, it is possible to aggregate the input features from previous snapshots towards final snapshot.

The inputs to a single GAT layer are graph snapshots (adjacency matrix) and graph feature or 1-hot encoded vectors for each node. The output is node representations across time that capture both local structural and temporal properties. The self-attention layer in GAT attends over the immediate neighbors of each node by employing self-attention over the node features. The proposed GAT layer is a variant of GAT [ 29 ], with dilated convolutions applied on each graph snapshot:

where \(h_u\) is the learned hidden representations of node u , \(\sigma \) is a non-linear activation function, \(N_u\) represents the immediate neighbors of u , \(W_d\) is the shared transformation weight of dilated convolutions, \(x_v\) is the input representation vector of node v , and \(\alpha _{vu}\) is the coefficient learned by the attention mechanism defined as:

where \(A_{vu}\) is the edge weight of the adjacency matrix between u and v , \(a^T\) is a weight vector parameter of the attention function implemented as feed-forward layer and \(\Vert \) is the concatenation operator. \(\alpha _{vu} \) is based on softmax function over the neighborhood of each node. This is to indicate the importance of node v to node v at the current snapshot. We use residual connections between GAT layers to avoid vanishing gradients and ensure smooth learning of the deep architecture.

Following, we adopt binary cross-entropy loss function to predict the existence of an edge between a pair of nodes using the learned node representations similar to [ 24 ]. The binary cross-entropy loss function for certain node v can be defined as:

where \(\mathcal {T}\) is the number of training snapshots, \(pos^t\) is the set of nodes connected with edges to v at snapshot t , \(neg^t\) is the negative sampling distribution for snapshot t , \( W_{neg}\) is the negative sampling parameter, \(\sigma \) is the sigmoid function and the dot operator represents the inner product operation between the representations of node pair.

4 Experiments

In this section, we conduct extensive experiments to evaluate the performance of our method via link prediction task. We present experiential results of our proposed method against several baselines.

4.1 Datasets

We use real-world dynamic graph datasets for analysis and performance evaluation. An outline of the datasets we use in our experiments is given in Table  1 .

The detailed dataset descriptions are listed as follows:

Enron [ 12 ] and UCI [ 21 ] are online communication network datasets. Enron dataset is constructed by email interactions between employees where the employees represent the nodes and the email communications represent the edges. UCI dataset is an online social network where the messages sent between users represent the edges.

Yelp Footnote 1 is a rating network (Round 11 of the Yelp Dataset Challenge) where the ratings of users and businesses are collected over specific time.

The datasets have multiple graph time steps and were created based on specific interactions in fixed time windows. For more details on the dataset collection and statistics see [ 24 ].

4.2 Experimental Setup

We evaluate the performance of different baselines by conducting link prediction experiment. We learn dynamic graph representations on snapshots \(S=\{1,2,\ldots ,t-1\}\) and use the links of \(t-1\) to predict the links at t graph snapshot. We follow the experiment design by [ 24 ] and classify each node pair into linked and non-linked nodes, and use sampling approach to achieve positive and negative node pairs where we randomly sample 25% of each snapshot nodes for training and use the remaining 75% for testing.

4.3 Parameter Settings

For our method, we train the model using Adam optimizer and adopt dropout regularization to avoid model over-fitting. We trained the model for a maximum of 300 epochs and the best performing model on the validation set, is chosen for link perdition evaluation. For the datasets, we use a 4 TCN blocks, with each GAT layer comprising attention heads computing 32 features, and we concatenate the output features. The output low-dimensional embedding size of the last fully-connected layer is set to 128.

4.4 Baseline Algorithms

We evaluate our method against the several baseline algorithms including static graph representation approaches such as: GAT [ 29 ], Node2Vec [ 7 ], GraphSAGE [ 8 ], graph autoencoders [ 9 ], GCN-AE and GAT-AE as autoencoders for link prediction [ 38 ]. Dynamic graph representation learning including Know-Evolve [ 27 ], DynamicTriad [ 36 ], DynGEM [ 10 ] and DySAT [ 24 ].

4.5 Link Prediction

The task of link prediction is to leverage structural and temporal information up to time step t and predict the existence of an edge between a pair of vertices ( u ,  v ) at time \(t + 1\) .

To evaluate the link prediction performance of each baseline model, we train a logistic regression classifier similar to [ 36 ]. We use Hadmard operator to compute element-wise product of feature representation for an edge using the connected pair of nodes as suggested by [ 7 ]. We repeat the experiment for 10 times and report the average of Area Under the ROC Curve (AUC) score.

We evaluate each baseline at each time step t separately, by training the models up to snapshot t and evaluate the performance at \(t+1\) for each snapshot up to \(\mathcal {T}\) snapshots. We report the averaged micro and macro AUC scores over all time steps for the methods in Table  2 (given in paper [ 24 ]).

From the results, we observe that TemporalGAT outperforms state-of-the-art methods in micro and macro AUC scores. Moreover, the results suggest that GAT using TCN architecture with minimal tuning outperforms graph representation methods, which validates the efficient of TCN in capturing the temporal and structural properties of dynamic graph snapshots.

5 Related Work

5.1 static graph representation learning.

Static graph embedding can be observed as dimensionality reduction approach that maps each node into a low dimensional vector space which preserves the vertex neighborhood proximities. Earlier research work for linear (e.g., PCA) and non-linear (e.g., IsoMap) dimensionality reduction methods have been studied extensively in the literature [ 3 , 23 , 26 ].

To improve large-scale graph embedding scalability, several approaches have been proposed such as [ 6 , 7 , 22 ], which adopt random walks and skip-gram procedure to learn network representations. Tang et al. [ 25 ] designed two loss functions to capture the local and global graph structure.

More recently, network embedding approaches design models that rely on convolutions to achieve good generalizations such as [ 8 , 11 , 19 , 29 ]. These methods usually provide performance gains on network analytic tasks such as node classification and link prediction. However, these approaches are unable to efficiency learn representations for dynamic graphs due to evolving nature.

5.2 Dynamic Graph Representation Learning

Methods for dynamic graphs representation learning are often an extension of static methods with an additional component to model the temporal variation. For instance, in matrix factorization approaches such as [ 3 , 26 ] the purpose is to learn node representations that come from eigenvectors of the graph Laplacian matrix decomposition. DANE [ 16 ] is based on this idea to update the eigenvectors of graph Laplacian matrix over time series.

For the methods based on random walk such as [ 7 , 22 ], the aim is to model the node transition probabilities of random walks as the normalized inner products of the corresponding node representations. In [ 33 ], the authors learn representations through observing the graph changes and incrementally re-sample a few walks in the successive time step.

Another line of works for dynamic graph representation employ temporal regularization that acts as smoothness factor to enforce embedding stability across time steps [ 36 , 37 ]. Recent works learn incremental node representations across time steps [ 10 ], where the authors apply an autoencoder approach that minimizes the reconstruction loss with a distance metric between connected nodes in the embedding space. However, this may not guarantee the ability of model to capture long-term proximities.

Another category of dynamic graph representation learning is point processes that are continuous in time [ 13 , 17 , 28 ]. These approaches model the edge occurrence as a point process and parameterize the intensity function by applying the learned node representations as an input to a neural network.

More recently, [ 24 ] proposed an approach that leverage the most relevant historical contexts through self-attention layers to preserve graph structure and temporal evolution patterns. Unlike this approach, our framework captures the most relevant historical information through applying a temporal self-attention architecture using TCN and GAT layers to learn dynamic representations for real-world data.

6 Conclusion

In this paper, we introduce a novel end-to-end dynamic graph representation learning framework named TemporalGAT. Our framework architecture is based on graph attention networks and temporal convolutional network and operates on dynamic graph-structured data through leveraging self-attention layers over time. Our experiments on various real-world dynamic graph datasets show that the proposed framework is superior to existing graph embedding methods as it achieves significant performance gains over several state-of-the-art static and dynamic graph embedding baselines.

There are several challenges for future work. For instance, learning representations for multi-layer dynamic graphs while incorporating structural and feature information is a promising direction.

https://www.yelp.com/dataset/challenge .

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR) (2015)

Google Scholar  

Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)

Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 (6), 1373–1396 (2003)

Article   Google Scholar  

Cao, S., Lu, W., Xu, Q.: Deep neural networks for learning graph representations. In: AAAI, pp. 1145–1152 (2016)

Chen, J., Ma, T., Xiao, C.: FastGCN: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018)

Fathy, A., Li, K.: ComNE: reinforcing network embedding with community learning. In: Gedeon, T., Wong, K.W., Lee, M. (eds.) ICONIP 2019. CCIS, vol. 1142, pp. 397–405. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36808-1_43

Chapter   Google Scholar  

Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)

Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, pp. 1024–1034 (2017)

Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584 (2017)

Kamra, N., Goyal, P., He, X., Liu, Y.: DynGEM: deep embedding method for dynamic graphs. In: IJCAI International Workshop on Representation Learning for Graphs (ReLiG) (2017)

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

Klimt, B., Yang, Y.: Introducing the Enron corpus. In: CEAS (2004)

Kumar, S., Zhang, X., Leskovec, J.: Learning dynamic embeddings from temporal interactions (2018)

Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1 (1), 2 (2007)

Li, J., Dani, H., Hu, X., Tang, J., Chang, Y., Liu, H.: Attributed network embedding for learning in a dynamic environment. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, pp. 387–396. ACM, New York (2017). https://doi.org/10.1145/3132847.3132919 , http://doi.acm.org/10.1145/3132847.3132919

Li, J., Dani, H., Xia, H., Tang, J., Liu, H.: Attributed network embedding for learning in a dynamic environment (2017)

Nguyen, G.H., Lee, J.B., Rossi, R.A., Ahmed, N.K., Kim, S.: Continuous-time dynamic network embeddings. In: Companion of the The Web Conference 2018, pp. 969–976 (2018)

Nguyen, S.H., Yao, Z., Kolbe, T.H.: Spatio-semantic comparison of large 3D city models in CityGML using a graph database (2017)

Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: International Conference on Machine Learning, pp. 2014–2023 (2016)

Oord, A.v.d., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)

Panzarasa, P., Opsahl, T., Carley, K.M.: Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. J. Am. Soc. Inform. Sci. Technol. 60 (5), 911–932 (2009)

Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)

Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290 (5500), 2323–2326 (2000)

Sankar, A., Wu, Y., Gou, L., Zhang, W., Yang, H.: Dynamic graph representation learning via self-attention networks (2018)

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)

Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500), 2319–2323 (2000)

Trivedi, R., Dai, H., Wang, Y., Song, L.: Know-evolve: deep temporal reasoning for dynamic knowledge graphs. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3462–3471 (2017). JMLR.org

Trivedi, R., Farajtbar, M., Biswal, P., Zha, H.: Representation learning over dynamic graphs (2018)

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rJXMpikCZ

Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. ACM (2016)

Yu, A., Dohan, D., Luong, M.T., Zhao, R., Chen, K., Le, Q.: QANet: combining local convolution with global self-attention for reading comprehension (2018)

Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

Yu, W., Cheng, W., Aggarwal, C.C., Zhang, K., Chen, H., Wang, W.: NetWalk: a flexible deep embedding approach for anomaly detection in dynamic networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018, pp. 2672–2681. ACM, New York (2018). https://doi.org/10.1145/3219819.3220024 , http://doi.acm.org/10.1145/3219819.3220024

Zhang, D., Yin, J., Zhu, X., Zhang, C.: Network representation learning: a survey. arXiv preprint arXiv:1801.05852 (2017)

Zhen, W., Zhang, J., Feng, J., Zheng, C.: Knowledge graph embedding by translating on hyperplanes. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

Zhou, L., Yang, Y., Ren, X., Wu, F., Zhuang, Y.: Dynamic network embedding by modeling triadic closure process. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

Zhu, L., Dong, G., Yin, J., Steeg, G.V., Galstyan, A.: Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Trans. Knowl. Data Eng. 28 (10), 2765–2777 (2016)

Zitnik, M., Agrawal, M., Leskovec, J.: Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34 (13), i457–i466 (2018)

Download references

Acknowledgments

This research was supported by Beijing Natural Science Foundation (No. L181010, 4172054), National Key R & D Program of China (No. 2016YFB0801100), and National Basic Research Program of China (No. 2013CB329605).

Author information

Authors and affiliations.

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 10081, China

Ahmed Fathy & Kan Li

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kan Li .

Editor information

Editors and affiliations.

School of Information Systems, Singapore Management University, Singapore, Singapore

Hady W. Lauw

Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Raymond Chi-Wing Wong

Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece

Alexandros Ntoulas

Ee-Peng Lim

Institute of Data Science, National University of Singapore, Singapore, Singapore

See-Kiong Ng

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

Sinno Jialin Pan

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Cite this paper.

Fathy, A., Li, K. (2020). TemporalGAT: Attention-Based Dynamic Graph Representation Learning. In: Lauw, H., Wong, RW., Ntoulas, A., Lim, EP., Ng, SK., Pan, S. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2020. Lecture Notes in Computer Science(), vol 12084. Springer, Cham. https://doi.org/10.1007/978-3-030-47426-3_32

Download citation

DOI : https://doi.org/10.1007/978-3-030-47426-3_32

Published : 06 May 2020

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-47425-6

Online ISBN : 978-3-030-47426-3

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Dynamic Graph Representation Learning via Self-Attention Networks

Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines.

1 Introduction

Learning latent representations (or embeddings) of nodes in graphs has been recognized as a fundamental learning problem due to its widespread use in various domains such as social media  (Perozzi et al., 2014 ) , biology  (Grover & Leskovec, 2016 ) , and knowledge bases  (Wang et al., 2014 ) . The basic idea is to learn a low-dimensional vector for each node, which encodes the structural properties of a node and its neighborhood (and possibly attributes). Such low-dimensional representations can benefit a plethora of graph analytical tasks such as node classification, link prediction, recommendation and graph visualization  (Perozzi et al., 2014 ; Tang et al., 2015 ; Grover & Leskovec, 2016 ; Wang et al., 2016 ; Ying et al., 2018 ; Krishnan et al., 2018 ) .

Previous work on graph representation learning mainly focuses on static graphs, which contain a fixed set of nodes and edges. However, many graphs in real-world applications are intrinsically dynamic, in which graph structures can evolve over time. They are usually represented as a sequence of graph snapshots from different time steps  (Leskovec et al., 2007 ) . Examples include academic co-authorship networks where authors may periodically switch their collaboration behaviors and email communication networks whose structures may change dramatically due to sudden events. In such scenarios, modeling temporal evolutionary patterns is important in accurately predicting node properties and future links.

Learning dynamic node representations is challenging, compared to static settings, due to the complex time-varying graph structures: nodes can emerge and leave, links can appear and disappear, and communities can merge and split. This requires the learned embeddings not only to preserve structural proximity of nodes, but also to jointly capture the temporal dependencies over time. Though some recent work attempts to learn node representations in dynamic graphs, they mainly impose a temporal regularizer to enforce smoothness of the node representations from adjacent snapshots   (Zhu et al., 2016 ; Li et al., 2017 ; Zhou et al., 2018 ) . However, these approaches may fail when nodes exhibit significantly distinct evolutionary behaviors. Trivedi et al. ( 2017 ) employ a recurrent neural architecture for temporal reasoning in multi-relational knowledge graphs. However, their temporal node representations are limited to modeling first-order proximity, while ignoring the structure of higher-order graph neighborhoods.

Attention mechanisms have recently achieved great success in many sequential learning tasks such as machine translation  (Bahdanau et al., 2015 ) and reading comprehension  (Yu et al., 2018 ) . The key underlying principle is to learn a function that aggregates a variable-sized input, while focusing on the parts most relevant to a certain context. When the attention mechanism uses a single sequence as both the inputs and the context, it is often called self-attention . Though attention mechanisms were initially designed to facilitate Recurrent Neural Networks (RNNs) to capture long-term dependencies, recent work by  Vaswani et al. ( 2017 ) demonstrates that a fully self-attentional network itself can achieve state-of-the-art performance in machine translation tasks.   Velickovic et al. ( 2018 ) extend self-attention to graphs by enabling each node to attend over its neighbors, achieving state-of-the-art results for semi-supervised node classification tasks in static graphs.

As dynamic graphs usually include periodical patterns such as recurrent links or communities, attention mechanisms are capable of utilizing information about most relevant historical context, to facilitate future prediction. Inspired by recent work on attention techniques, we present a novel neural architecture named Dynamic Self-Attention Network (DySAT) to learn node representations on dynamic graphs. Specifically, we employ self-attention along two dimensions: structural neighborhoods and temporal dynamics, i.e. , DySAT generates a dynamic representation for a node by considering both its neighbors and historical representations, following a self-attentional strategy. Unlike static graph embedding methods that focus entirely on preserving structural proximity, we learn dynamic node representations that reflect the temporal evolution of graph structure over a varying number of historical snapshots. In contrast to temporal smoothness-based methods, DySAT learns attention weights that capture temporal dependencies at a fine-grained node-level granularity.

We evaluate our framework on the dynamic link prediction task using four benchmarks of different sizes including two email communication networks  (Klimt & Yang, 2004 ; Panzarasa et al., 2009 ) and two bipartite rating networks  (Harper & Konstan, 2016 ) . Our evaluation results show that DySAT achieves significant improvements (3.6% macro-AUC on average) over several state-of-the-art baselines and maintains a more stable performance over different time steps.

2 Related Work

Our framework is related to previous representation learning techniques on static graphs, dynamic graphs, and recent developments in self-attention mechanisms.

Static graph embeddings. Early work on unsupervised graph representation learning exploits the spectral properties of various graph matrix representations, such as Laplacian, etc. to perform dimensionality reduction  (Tenenbaum et al., 2000 ; Belkin & Niyogi, 2001 ) . To improve scalability, some work  (Perozzi et al., 2014 ; Grover & Leskovec, 2016 ) utilizes Skip-gram methods, inspired by their success in Natural Language Processing (NLP). Recently, several graph neural network architectures based on generalizations of convolutions have achieved tremendous success, among which many methods are designed for supervised or semi-supervised learning tasks  (Niepert et al., 2016 ; Defferrard et al., 2016 ; Kipf & Welling, 2017 ; Sankar et al., 2017 ; Velickovic et al., 2018 ) .   Hamilton et al. ( 2017b ) extend graph convolutional methods through trainable neighborhood aggregation functions, to propose a general framework applicable to unsupervised representation learning. However, these methods are not designed to model temporal evolutionary patterns in dynamic graphs.

Dynamic graph embeddings. Most techniques employ temporal smoothness regularization to ensure embedding stability across consecutive time-steps  (Zhu et al., 2016 ) .   Zhou et al. ( 2018 ) additionally use triadic closure  (Kossinets & Watts, 2006 ) as guidance, leading to significant improvements. Neural methods were recently explored in the knowledge graph domain by  Trivedi et al. ( 2017 ) , who employ a recurrent neural architecture for temporal reasoning. However, their model is limited to tracing link evolution, thus limited to capturing first-order proximity.   Goyal et al. ( 2017 ) learn incremental node embeddings through initialization from the previous time steps, however, this may not guarantee the model to capture long-term graph similarity. A few recent works  (Nguyen et al., 2018 ; Zuo et al., 2018 ; Trivedi et al., 2018 ; Ma et al., 2018 ; Kumar et al., 2018 ; Krishnan et al., 2017 ) examine a related setting of temporal graphs with continuous time-stamped links for representation learning, which is however orthogonal to the established problem setup of using dynamic graph snapshots. Li et al. ( 2017 ) learn node embeddings in dynamic attributed graphs by initially training an offline model, followed by incremental updates over time. However, their key focus is online learning to improve efficiency over re-training static models, while our goal is to improve representation quality by exploiting the temporal evolutionary patterns in graph structure. Unlike previous approaches, our framework captures the most relevant historical contexts through a self-attentional architecture, to learn dynamic node representations.

Self-attention mechanisms. Recent advancements in many NLP tasks have demonstrated the superiority of self-attention in achieving state-of-the-art performance  (Vaswani et al., 2017 ; Lin et al., 2017 ; Tan et al., 2018 ; Shen et al., 2018 ; Shaw et al., 2018 ) . In  DySAT, we employ self-attention mechanisms to compute a dynamic node representation by attending over its neighbors and previous historical representations. Our approach of using self-attention over neighbors is closely related to the Graph Attention Network (GAT)  (Velickovic et al., 2018 ) , which employs neighborhood attention for semi-supervised node classification in a static graph. As dynamic graphs usually contain periodical patterns, we extend the self-attention mechanisms over the historical representations of a particular node to capture its temporal evolution behaviors.

3 Problem Definition

4 dynamic self-attention network.

In this section, we first describe the high-level structure of our model. DySAT consists of two major novel components: structural and temporal self-attention layers, which can be utilized to construct arbitrary graph neural architectures through stacking of layers. Similar to existing studies on attention mechanisms, we employ multi-head attentions to improve model capacity and stability.

DySAT  consists of a structural block followed by a temporal block, as illustrated in Figure  1 , where each block may contain multiple stacked layers of the corresponding layer type. The structural block extracts features from the local neighborhood through self-attentional aggregation, to compute intermediate node representations for each snapshot. These representations feed as input to the temporal block, which attends over multiple time steps, capturing temporal variations in the graph.

4.1 Structural self-attention

The input of this layer is a graph snapshot 𝒢 ∈ 𝔾 𝒢 𝔾 {\mathcal{G}}\in{\mathbb{G}} and a set of input node representations { 𝒙 v ∈ ℝ D , ∀ v ∈ 𝒱 } formulae-sequence subscript 𝒙 𝑣 superscript ℝ 𝐷 for-all 𝑣 𝒱 \{{\bm{x}}_{v}\in{\mathbb{R}}^{D},\forall v\in{\mathcal{V}}\} where D 𝐷 D is the input embedding dimension. The input to the initial layer can be set as 1-hot encoded vectors for each node (or attributes if available). The output is a new set of node representations { 𝒛 v ∈ ℝ F , ∀ v ∈ 𝒱 } formulae-sequence subscript 𝒛 𝑣 superscript ℝ 𝐹 for-all 𝑣 𝒱 \{{\bm{z}}_{v}\in{\mathbb{R}}^{F},\forall v\in{\mathcal{V}}\} with F 𝐹 F dimensions that capture local structural properties.

Specifically, the structural self-attention layer attends over the immediate neighbors of a node v 𝑣 v (in snapshot 𝒢 𝒢 {\mathcal{G}} ), by computing attention weights as a function of their input node embeddings. The structural attention layer is a variant of GAT  (Velickovic et al., 2018 ) , applied on a single snapshot:

(1)

where 𝒩 v = { u ∈ 𝒱 : ( u , v ) ∈ ℰ } subscript 𝒩 𝑣 conditional-set 𝑢 𝒱 𝑢 𝑣 ℰ \mathcal{N}_{v}=\{u\in{\mathcal{V}}:(u,v)\in{\mathcal{E}}\} is the set of immediate neighbors of node v 𝑣 v in snapshot 𝒢 𝒢 {\mathcal{G}} ; 𝑾 s ∈ ℝ D × F superscript 𝑾 𝑠 superscript ℝ 𝐷 𝐹 {\bm{W}}^{s}\in{\mathbb{R}}^{D\times F} is a shared weight transformation applied to each node in the graph; 𝒂 ∈ ℝ 2 ​ D 𝒂 superscript ℝ 2 𝐷 {\bm{a}}\in{\mathbb{R}}^{2D} is a weight vector parameterizing the attention function implemented as feed-forward layer; | | || is the concatenation operation and σ ​ ( ⋅ ) 𝜎 ⋅ \sigma(\cdot) is a non-linear activation function. Note that A u ​ v subscript 𝐴 𝑢 𝑣 A_{uv} is the weight of link ( u , v ) 𝑢 𝑣 (u,v) in the current snapshot 𝒢 𝒢 {\mathcal{G}} . The set of learned coefficients α u ​ v subscript 𝛼 𝑢 𝑣 \alpha_{uv} , obtained by a softmax over the neighbors of each node, indicate the importance or contribution of node u 𝑢 u to node v 𝑣 v at the current snapshot. We use a LeakyRELU non-linearity to compute the attention weights, followed by ELU for the output representations. In our experiments, we employ sparse matrices to implement the masked self-attention over neighbors.

4.2 Temporal self-attention

The key objective of the temporal self-attentional layer is to capture the temporal variations in graph structure over multiple time steps. The input representation of node v 𝑣 v at time-step t 𝑡 t , 𝒙 v t subscript superscript 𝒙 𝑡 𝑣 {\bm{x}}^{t}_{v} , constitutes an encoding of the current local structure around v 𝑣 v . We use 𝒙 v t subscript superscript 𝒙 𝑡 𝑣 {\bm{x}}^{t}_{v} as the query to attend over its historical representations ( < t absent 𝑡 <t ), tracing the evolution of the local neighborhood around v 𝑣 v . Thus, temporal self-attention facilitates learning of dependencies between various representations of a node across different time steps.

To compute the output representation of node v 𝑣 v at t 𝑡 t , we use the scaled dot-product form of attention  (Vaswani et al., 2017 ) where the queries, keys, and values are set as the input node representations. The queries, keys, and values are first transformed to a different space by using linear projections matrices 𝑾 q ∈ ℝ D ′ × F ′ , 𝑾 k ∈ ℝ D ′ × F ′ formulae-sequence subscript 𝑾 𝑞 superscript ℝ superscript 𝐷 ′ superscript 𝐹 ′ subscript 𝑾 𝑘 superscript ℝ superscript 𝐷 ′ superscript 𝐹 ′ {\bm{W}}_{q}\in{\mathbb{R}}^{D^{\prime}\times F^{\prime}},{\bm{W}}_{k}\in{\mathbb{R}}^{D^{\prime}\times F^{\prime}} and 𝑾 v ∈ ℝ D ′ × F ′ subscript 𝑾 𝑣 superscript ℝ superscript 𝐷 ′ superscript 𝐹 ′ {\bm{W}}_{v}\in{\mathbb{R}}^{D^{\prime}\times F^{\prime}} respectively. Here, we allow each time-step t 𝑡 t to attend over all time-steps up to and including t 𝑡 t , to prevent leftward information flow and preserve the auto-regressive property. The temporal self-attention is defined as:

(2)

where 𝜷 𝒗 ∈ ℝ T × T subscript 𝜷 𝒗 superscript ℝ 𝑇 𝑇 \bm{\beta_{v}}\in{\mathbb{R}}^{T\times T} is the attention weight matrix obtained by the multiplicative attention function and 𝑴 ∈ ℝ T × T 𝑴 superscript ℝ 𝑇 𝑇 {\bm{M}}\in{\mathbb{R}}^{T\times T} is a mask matrix with each entry M i ​ j ∈ { − ∞ , 0 } subscript 𝑀 𝑖 𝑗 0 M_{ij}\in\{-\infty,0\} . When M i ​ j = − ∞ subscript 𝑀 𝑖 𝑗 M_{ij}=-\infty , the softmax function results in a zero attention weight, i.e. , β v i ​ j = 0 superscript subscript 𝛽 𝑣 𝑖 𝑗 0 \beta_{v}^{ij}=0 , which switches off the attention from time-step i 𝑖 i to j 𝑗 j . To encode the temporal order, we define 𝑴 𝑴 {\bm{M}} as:

Refer to caption

4.3 Multi-Head Attention

We additionally employ multi-head attention  (Vaswani et al., 2017 ) to jointly attend to different subspaces at each input, leading to a leap in model capacity. We use multiple attention heads, followed by concatenation, in both structural and temporal self-attention layers:

Structural multi-head self-attention: (3)
Temporal multi-head self-attention: (4)

where H 𝐻 H is the number of attention heads, 𝒉 v ∈ ℝ F subscript 𝒉 𝑣 superscript ℝ 𝐹 {\bm{h}}_{v}\in{\mathbb{R}}^{F} and 𝑯 v ∈ ℝ T × F ′ subscript 𝑯 𝑣 superscript ℝ 𝑇 superscript 𝐹 ′ {\bm{H}}_{v}\in{\mathbb{R}}^{T\times F^{\prime}} are the outputs of structural and temporal multi-head attentions respectively. Note that while structural attention is applied on a single snapshot, temporal attention operates over multiple time-steps.

4.4 DySAT Architecture

In this section, we present our neural architecture DySAT for Dynamic Graph Representation Learning, that uses the above defined structural and temporal self-attention layers as fundamental modules. As shown in Figure  1 , DySAT has three modules from its top to bottom, (1) structural attention block, (2) temporal attention block, and (3) graph context prediction. The model takes as input a collection of T 𝑇 T graph snapshots, and generates outputs latent node representations at each time step.

subscript superscript 𝒉 𝑇 𝑣 superscript 𝒑 𝑇 \{{\bm{h}}^{1}_{v}+{\bm{p}}^{1},{\bm{h}}^{2}_{v}+{\bm{p}}^{2},\dots,{\bm{h}}^{T}_{v}+{\bm{p}}^{T}\} for node v 𝑣 v across multiple time steps. This block also follows a similar structure with multiple stacked temporal self-attention layers. The outputs of the final layer pass into a position-wise feed-forward layer to give the final node representations { 𝒆 v 1 , 𝒆 v 2 , … , 𝒆 v T } ​ ∀ v ∈ V subscript superscript 𝒆 1 𝑣 subscript superscript 𝒆 2 𝑣 … subscript superscript 𝒆 𝑇 𝑣 for-all 𝑣 𝑉 \{{\bm{e}}^{1}_{v},{\bm{e}}^{2}_{v},\dots,{\bm{e}}^{T}_{v}\}\;\forall v\in V .

Graph context prediction. To ensure that the learned representations capture both structural and temporal information, we define an objective function that preserves the local structure around a node, across multiple time steps. We use the dynamic representations of a node v 𝑣 v at time step t 𝑡 t , 𝒆 v t superscript subscript 𝒆 𝑣 𝑡 {\bm{e}}_{v}^{t} to predict the occurrence of nodes appearing the local neighborhood around v 𝑣 v at t 𝑡 t . In particular, we use a binary cross-entropy loss function at each time step to encourage nodes co-occurring in fixed-length random walks, to have similar representations.

(5)

where σ 𝜎 \sigma is the sigmoid function, < . > <.> denotes the inner product operation, 𝒩 w ​ a ​ l ​ k t ​ ( v ) subscript superscript 𝒩 𝑡 𝑤 𝑎 𝑙 𝑘 𝑣 \mathcal{N}^{t}_{walk}(v) is the set of nodes that co-occur with v 𝑣 v on fixed-length random walks at snapshot t 𝑡 t , P n t subscript superscript 𝑃 𝑡 𝑛 P^{t}_{n} is a negative sampling distribution for snapshot 𝒢 t superscript 𝒢 𝑡 {\mathcal{G}}^{t} , and w n subscript 𝑤 𝑛 w_{n} , negative sampling ratio, is a tunable hyper-parameter to balance the positive and negative samples.

5 Experiments

We evaluate the quality of our learned node representations on the fundamental task of dynamic link prediction. We choose this task since it has been widely used  (Trivedi et al., 2017 ; Goyal et al., 2017 ; Li et al., 2018 ) in evaluating the quality of dynamic node representations to predict the temporal evolution in graph structure.

In our experiments, we compare the performance of DySAT against a variety of static and dynamic graph representation learning baselines. Our experimental results on four publicly available benchmarks indicate that  DySAT  achieves significant performance gains over other methods.

Communication Networks Rating Networks
Dataset Enron UCI Yelp ML-10M
# Nodes 143 1,809 6,569 20,537
# Links 2,347 16,822 95,361 43,760
# Time steps 10 13 12 13

5.1 Datasets

We use four dynamic graph datasets with two communication and bipartite rating networks each.

Communication networks. We consider two publicly available communication network datasets: Enron  (Klimt & Yang, 2004 ) and UCI  (Panzarasa et al., 2009 ) . In Enron, the communication links are email interactions between core employees and the links in UCI represent messages sent between users on an online social network platform.

Rating networks. We use two bipartite rating networks from Yelp 1 1 1 https://www.yelp.com/dataset/challenge and MovieLens  (Harper & Konstan, 2016 ) . In Yelp, the dynamic graph comprises links between two types of nodes, users and businesses, derived from the observed ratings over time. ML-10M consists of a user-tag interaction network where user-tag links connects users with the tags they applied on certain movies.

In each dataset, multiple graph snapshots are created based on the observed interactions in fixed-length time windows. Dataset statistics are shown in Table  1 , while Appendix  H has further details.

5.2 Experimental Setup

𝑡 1 {\mathcal{G}}^{t+1} during evaluation. We compare different models based on their ability to correctly classify each example (node pair) into links and non-links. To further analyze predictive capability, we also evaluate new link prediction, with a focus on new links that appear at each time step, (Appendix  B ).

𝑡 1 {\mathcal{G}}^{t+1} and an equal number of randomly sampled pairs of unconnected nodes (non-links). A held-out validation set (20% links) is used to tune the hyper-parameters across all models, which is later discarded. We randomly sample 25% of the examples for training and use the remaining 75% as our test set. We repeat this for 10 randomized runs and report the average performance in our results.

We follow the strategy recommended by   Grover & Leskovec ( 2016 ) to compute the feature representation for a pair of nodes, using the Hadamard Operator ( 𝒆 u t ⊙ 𝒆 v t direct-product subscript superscript 𝒆 𝑡 𝑢 subscript superscript 𝒆 𝑡 𝑣 {\bm{e}}^{t}_{u}\odot{\bm{e}}^{t}_{v} ), for all methods unless explicitly specified otherwise. The Hadamard operator computes the element-wise product of two vectors and closely mirrors the widely used inner product operation in learning node embeddings. We evaluate the performance of link prediction using Area Under the ROC Curve (AUC) scores  (Grover & Leskovec, 2016 ) .

We implement DySAT in Tensorflow  (Abadi et al., 2016 ) and employ mini-batch gradient descent with Adam optimizer  (Kingma & Ba, 2015 ) for training. For Enron, we use a single layer in both the structural and temporal blocks, with each layer comprising 16 attention heads computing 8 features apiece (for a total of 128 dimensions). In the other datasets, we use two structural self-attentional layers with 16 and 8 heads respectively, each computing 16 features (layer sizes of 256, 128). The model is trained for a maximum of 200 epochs with a batch size of 256 nodes and the best performing model on the validation set, is chosen for evaluation.

Method Enron UCI Yelp ML-10M
Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC
node2vec 83.72 83.05 79.99 80.49 67.86 65.34 87.74 87.52
G-SAGE 82.48 81.88 79.15 82.89 60.95 58.56 86.19 89.92
G-SAGE + GAT 72.52 73.34 74.03 79.83 66.15 65.09 83.97 84.93
GCN-AE 81.55 81.71 80.53 83.50 66.71 65.82 85.49 85.74
GAT-AE 75.71 75.97 79.98 81.86 65.92 65.37 87.01 86.75
DynamicTriad 80.26 78.98 77.59 80.28 63.53 62.69 88.71 88.43
DynGEM 67.83 69.72 77.49 79.82 66.02 65.94 73.69 85.96
DynAERNN 72.02 72.01 79.95 83.52 69.54 68.91 87.73 89.47
DySAT 85.71 86.60 81.03 85.81 70.15 69.87 90.82 93.68

5.3 Baseline

We compare the performance of DySAT with several state-of-the-art dynamic graph embedding techniques. In addition, we include several static graph embedding methods in comparison to analyze the gains of using temporal information for dynamic link prediction. To make a fair comparison with static methods, we provide access to the entire history of snapshots by constructing an aggregated graph upto time t 𝑡 t , where the weight of each link is defined as the cumulative weight till t 𝑡 t agnostic of its occurrence times. We use author-provided implementations for all the baselines and set the final embedding dimension d = 128 𝑑 128 d=128 .

We compare against several state-of-the-art unsupervised static embedding methods: node2vec  (Grover & Leskovec, 2016 ) , GraphSAGE  (Hamilton et al., 2017b ) and graph autoencoders  (Hamilton et al., 2017a ) . We experiment with different aggregators in GraphSAGE, namely, GCN, mean-pooling, max-pooling, and LSTM, to report the performance of the best performing aggregator in each dataset. To provide a fair comparison with GAT  (Velickovic et al., 2018 ) , which originally conduct experiments only on node classification, we implement a graph attention layer as an additional aggregator in GraphSAGE, which we denote by GraphSAGE + GAT. We also train GCN and GAT as autoencoders for link prediction along the suggested lines of  (Zitnik et al., 2018 ) , denoted by GCN-AE and GAT-AE respectively. In the dynamic setting, we evaluate DySAT  against the most recent studies on dynamic graph embedding including DynAERNN  (Goyal et al., 2018 ) , DynamicTriad  (Zhou et al., 2018 ) , and DynGEM  (Goyal et al., 2017 ) . The details of hyper-parameter tuning for all methods can be found in Appendix  G .

5.4 Experimental Results

Further, we compare the model performance at each time step (Figure  2 ), to obtain a deep understanding of their temporal behaviors. We fine the performance of DySAT to be relatively more stable than other methods. This contrast is pronounced in the communication networks (Enron and UCI), where we observe drastic drops in performance of static embedding methods at certain time steps.

The runtime per mini-batch of DySAT on ML-10M, using a machine with Nvidia Tesla V100 GPU and 28 CPU cores, is 0.72 seconds. In comparison, a model variant without temporal attention (Appendix  A.1 ) takes 0.51 seconds, which illustrates the relatively low cost of temporal attention.

Refer to caption

6 Discussion

Our experimental results provide several interesting observations and insights to the performance of different graph embedding techniques.

First, we observe that GraphSAGE achieves comparable performance to DynamicTriad across different datasets, despite being trained only on static graphs. One possible explanation may be that GraphSAGE uses trainable neighbor-aggregation functions, while DynamicTriad employs Skip-gram based methods augmented with temporal smoothness. This leads us to conjecture that the combination of structural and temporal modeling with expressive aggregation functions, such as multi-head attention, is responsible for the consistently superior performance of DySAT on dynamic link prediction. We also observe that node2vec achieves consistent performance agnostic of temporal information, which demonstrates the effectiveness of second-order random walk sampling. This observation points to the direction of applying sampling techniques to further improve DySAT.

In DySAT, we employ structural attention layers followed by temporal attention layers. We choose this design because graph structures are not stable over time, which makes directly employing structural attention layers after temporal attention layers infeasible. We also consider another alternative design choice that applies self-attention along the two dimensions of neighbors and time together following the strategy similar to  (Shen et al., 2018 ) . In practice, this would be computationally expensive due to variable number of neighbors per node across multiple snapshots. We leave exploring other architectural design choices based on structural and temporal self-attentions as future work.

In the current setup, we store the adjacency matrix of each snapshot in memory using sparse matrix, which may pose memory challenges when scaling to large graphs. In the future, we plan to explore DySAT with memory-efficient mini-batch training strategy along the lines of GraphSAGE  (Hamilton et al., 2017b ) . Further, we develop an incremental self-attention network (IncSAT) that is efficient in both computation and memory cost as a direct extension of DySAT. Our initial results are promising as reported in Appendix  E , which opens the door to future exploration of self-attentional architectures for incremental (or streaming) graph representation learning. We also evaluate the capability of DySAT on multi-step link prediction or forecasting and observe significant relative improvements of 6% AUC on average over existing methods, as reported in Appendix  C .

7 Conclusion

In this paper, we introduce a novel self-attentional neural network architecture named DySAT to learn node representations in dynamic graphs. Specifically,  DySAT computes dynamic node representations using self-attention over the (1) structural neighborhood and (2) historical node representations, thus effectively captures the temporal evolutionary patterns of graph structures. Our experiment results on various real-world dynamic graph datasets indicate that DySAT achieves significant performance gains over several state-of-the-art static and dynamic graph embedding baselines. Though our experiments are conducted on graphs without node features, DySAT can be easily generalized on feature-rich graphs. Another interesting direction is exploring continuous-time generalization of our framework to incorporate more fine-grained temporal variations.

  • Abadi et al. (2016) Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016. , pp. 265–283, 2016.
  • Bahdanau et al. (2015) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR) , 2015.
  • Belkin & Niyogi (2001) Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8, 2001, Vancouver, British Columbia, Canada] , pp.  585–591, 2001.
  • Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain , pp.  3837–3845, 2016.
  • Gehring et al. (2017) Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pp. 1243–1252, 2017.
  • Goyal et al. (2017) Palash Goyal, Nitin Kamra, Xinran He, and Yan Liu. Dyngem: Deep embedding method for dynamic graphs. In IJCAI International Workshop on Representation Learning for Graphs (ReLiG) , August 2017.
  • Goyal et al. (2018) Palash Goyal, Sujit Rokka Chhetri, and Arquimedes Canedo. dyngraph2vec: Capturing network dynamics using dynamic graph representation learning. arXiv preprint arXiv:1809.02657 , 2018.
  • Grover & Leskovec (2016) Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016 , pp.  855–864, 2016.
  • Hamilton et al. (2017a) William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 , 2017a.
  • Hamilton et al. (2017b) William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA , pp.  1025–1035, 2017b.
  • Harper & Konstan (2016) F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS) , 5(4):19, 2016.
  • Kingma & Ba (2015) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR) , 2015.
  • Kipf & Welling (2017) Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference for Learning Representations (ICLR) , 2017.
  • Klimt & Yang (2004) Bryan Klimt and Yiming Yang. Introducing the enron corpus. In CEAS 2004 - First Conference on Email and Anti-Spam, July 30-31, 2004, Mountain View, California, USA , 2004.
  • Kossinets & Watts (2006) Gueorgi Kossinets and Duncan J Watts. Empirical analysis of an evolving social network. science , 311(5757):88–90, 2006.
  • Krishnan et al. (2017) Adit Krishnan, Ashish Sharma, and Hari Sundaram. Improving latent user models in online social media. arXiv preprint arXiv:1711.11124 , 2017.
  • Krishnan et al. (2018) Adit Krishnan, Ashish Sharma, Aravind Sankar, and Hari Sundaram. An adversarial approach to improve long-tail performance in neural collaborative filtering. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management , pp.  1491–1494. ACM, 2018.
  • Kumar et al. (2018) Srijan Kumar, Xikun Zhang, and Jure Leskovec. Learning dynamic embeddings from temporal interactions. arXiv preprint arXiv:1812.02289 , 2018.
  • Leskovec et al. (2007) Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD) , 1(1):2, 2007.
  • Li et al. (2017) Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Huan Liu. Attributed network embedding for learning in a dynamic environment. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017 , pp.  387–396, 2017.
  • Li et al. (2018) Jundong Li, Kewei Cheng, Liang Wu, and Huan Liu. Streaming link prediction on dynamic attributed networks. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018 , pp.  369–377, 2018.
  • Lin et al. (2017) Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-attentive sentence embedding. In International Conference on Learning Representations (ICLR) , 2017.
  • Ma et al. (2018) Yao Ma, Ziyi Guo, Zhaochun Ren, Eric Zhao, Jiliang Tang, and Dawei Yin. Dynamic graph neural networks. arXiv preprint arXiv:1810.10627 , 2018.
  • Nguyen et al. (2018) Giang Hoang Nguyen, John Boaz Lee, Ryan A Rossi, Nesreen K Ahmed, Eunyee Koh, and Sungchul Kim. Continuous-time dynamic network embeddings. In 3rd International Workshop on Learning Representations for Big Networks (WWW BigNet) , 2018.
  • Niepert et al. (2016) Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 , pp. 2014–2023, 2016.
  • Panzarasa et al. (2009) Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. JASIST , 60(5):911–932, 2009.
  • Perozzi et al. (2014) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: online learning of social representations. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014 , pp.  701–710, 2014.
  • Sankar et al. (2017) Aravind Sankar, Xinyang Zhang, and Kevin Chen-Chuan Chang. Motif-based convolutional neural network on graphs. CoRR , abs/1711.05697, 2017.
  • Shaw et al. (2018) Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers) , pp.  464–468, 2018.
  • Shen et al. (2018) Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. Disan: Directional self-attention network for rnn/cnn-free language understanding. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018 , 2018.
  • Srivastava et al. (2014) Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research , 15(1):1929–1958, 2014.
  • Tan et al. (2018) Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen, and Xiaodong Shi. Deep semantic role labeling with self-attention. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7 , 2018.
  • Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE: large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015 , pp.  1067–1077, 2015. doi: 10.1145/2736277.2741093 .
  • Tenenbaum et al. (2000) Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science , 290(5500):2319–2323, 2000.
  • Trivedi et al. (2017) Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pp. 3462–3471, 2017.
  • Trivedi et al. (2018) Rakshit Trivedi, Mehrdad Farajtbar, Prasenjeet Biswal, and Hongyuan Zha. Representation learning over dynamic graphs. arXiv preprint arXiv:1803.04051 , 2018.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA , pp.  6000–6010, 2017.
  • Velickovic et al. (2018) Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations (ICLR) , 2018.
  • Wang et al. (2016) Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016 , pp.  1225–1234, 2016. doi: 10.1145/2939672.2939753 .
  • Wang et al. (2014) Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada. , pp.  1112–1119, 2014.
  • Ying et al. (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pp.  974–983. ACM, 2018.
  • Yu et al. (2018) Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. Qanet: Combining local convolution with global self-attention for reading comprehension. In International Conference on Learning Representations (ICLR) , 2018.
  • Zhou et al. (2018) Le-kui Zhou, Yang Yang, Xiang Ren, Fei Wu, and Yueting Zhuang. Dynamic network embedding by modeling triadic closure process. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7 , 2018.
  • Zhu et al. (2016) Linhong Zhu, Dong Guo, Junming Yin, Greg Ver Steeg, and Aram Galstyan. Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Trans. Knowl. Data Eng. , 28(10):2765–2777, 2016.
  • Zitnik et al. (2018) Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics , 34(13):457–466, 2018.
  • Zuo et al. (2018) Yuan Zuo, Guannan Liu, Hao Lin, Jia Guo, Xiaoqian Hu, and Junjie Wu. Embedding temporal network via neighborhood formation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pp.  2857–2866. ACM, 2018.

Appendix A Self-Attention Analysis

Since self-attention plays a critical role in the formulation of DySAT, we conduct exploratory analyses to study (a) effectiveness : how do the different attention layers affect the performance of DySAT? (b) efficiency : how efficient is self-attention compare in comparison to existing recurrent models? and (c) interpretability : are the learnt attention weights visually interpretable for humans?

A.1 Effectiveness of Self-Attention

We evaluate the effectiveness of our self-attentional architecture DySAT in two parts:

Structural and temporal self-attention effectiveness. We present an ablation study on the structural and temporal attentional layers in DySAT. To demonstrate the effectiveness of self-attentional layers, we independently remove the structural and temporal attention blocks from DySAT to create simpler architectures, that are optimized using the same loss function (Eqn.  5 ). Note that the model obtained on removing the temporal attention block is different from static methods since the node embeddings are jointly optimized (using Eqn.  5 ) across all snapshots without any explicit temporal modeling. We use the same hyper-parameter configuration of DySAT on each dataset from the original experiments to train the new model. The performance comparison is shown in Table  3 .

We observe that in some datasets, the structural attention block is able to learn some temporal evolution patterns in graph structure, despite the lack of explicit temporal modeling, while the removal of structural attention leads to a decrease in model performance on most datasets. However, DySAT consistently outperforms the variants with 3% average gain in Macro-AUC, which validates our choice of using structural and temporal self-attentional layers.

Multi-head attention effectiveness. We conduct a sensitivity analysis on the number of attention heads used during multi-head attention. Specifically, we vary the number of structural and temporal heads in DySAT independently in the range { 1 , 2 , 4 , 8 , 16 } 1 2 4 8 16 \{1,2,4,8,16\} , while keeping the layer sizes fixed for fairness. As shown in Figure  3 , we can see that DySAT benefits from multi-head attention on both structural and temporal layers. The performance stabilizes with 8 attention heads, which appears sufficient to capture graph evolution from multiple latent facets.

Refer to caption

Method Enron UCI Yelp ML-10M
Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC
Original 85.71 86.60 81.03 85.81 70.15 69.87 90.82 93.68
No Structural 85.21 86.50 74.48 81.24 70.11 67.85 88.56 90.34
No Temporal 84.50 85.68 76.61 79.97 68.34 67.20 89.61 91.10

A.2 Scalability Analysis

In this section, we compare the scalability of DySAT against a state-of-the-art dynamic embedding method DynAERNN, which uses an RNN-based architecture to model dynamic graph evolution. We choose this method for comparison as it is conceptually closest to our self-attentional architecture, as per well-established literature. We compare the model training time of two methods by varying the temporal history window , i.e. , the number of previous snapshots used as temporal history. Figure  4 depicts runtime per epoch of both models on ML-10M, using a machine with Nvidia Tesla V100 GPU and 16 CPU cores.

DySAT achieves significantly lower training times than DynAERNN, e.g. , the runtime per epoch of DySAT is 88 seconds with a window size of 10 on ML-10M, in comparison to 723 seconds for DynAERNN. Self-attention is easily parallelized across both time as well as attention heads, while RNNs suffer due to the inherently sequential nature of back-propagation through time. Our results demonstrate the practical scalability advantage for pure self-attentional architectures over RNN-based methods.

Refer to caption

A.3 Temporal Attention Visualization

We also conduct a qualitative analysis attempting to obtain deeper insights into the distribution of temporal attention weights learned by DySAT on datasets with distinct evolutionary behaviors. In this experiment, we examine the temporal attention coefficients learned at each time step k 𝑘 k , which indicate the relative importance of each historical snapshot in predicting the links at k 𝑘 k . We consider two diverse datasets UCI and Yelp: UCI is an email communication network with periodically recurring user interactions, while Yelp is a rating network where new user-business ratings get added over time. Figure  4 visualizes a heatmap of the learned temporal attention weights on the UCI and Yelp dataset for the first 11 time steps.

In Figure  4 , each row depicts the set of all attention weights over historical time steps t 1 ​ … ​ t k − 1 subscript 𝑡 1 … subscript 𝑡 𝑘 1 t_{1}...t_{k-1} for predicting links at time step k 𝑘 k . We find the attention weights to be biased towards recent snapshots in Yelp, while being more uniformly distributed in UCI. This observation conforms to the nature of graph evolution in these datasets since rating behaviors in Yelp tends to be bursty and correlated with events such restaurant opening, discounted sales, etc., while inter-user communications in UCI typically span longer time intervals. Thus, DySAT is able to learn different distributions of attention weights adaptive to the mechanism of temporal graph evolution across different datasets. While this analysis provides a macro-perspective on the weights learned by temporal attention across different datasets, an appropriate node-level interpretation of these coefficients (such as e.g. ,  (Bahdanau et al., 2015 ) ) requires further domain knowledge about the dataset under study, and is left as future work.

We conduct a qualitative analysis to obtain deeper insights into the distribution of temporal attention weights learned by DySAT. In this experiment, we examine the temporal attention coefficients learned at each time step t 𝑡 t , which indicate the relative importance of each historical snapshot ( < t absent 𝑡 <t ) in predicting the links at t 𝑡 t . We choose the Enron dataset to visualize the mean and standard deviation of temporal attention coefficients, over all the nodes. Figure  4 visualizes a heatmap of the learned temporal attention weights on Enron dataset for the first 10 time steps.

From Figure  4 , we observe that the mean temporal attention weights are mildly biased towards recent snapshots, while the historical snaphots vary in their importance across different time steps. Further, we find that the standard deviation of attention weights across different nodes is generally high and exhibits more variability. Thus, the temporal attention weights are well distributed across historical snapshots, with significant variance across different nodes in the graph. While this analysis attempts to provide a high-level perspective on the weights learned by temporal attention, an appropriate interpretation of these coefficients (as done by e.g. ,  (Bahdanau et al., 2015 ) ) requires further domain knowlege about the dataset under study, and is left as future work.

Appendix B Dynamic New Link Prediction

𝑡 1 {\mathcal{G}}_{t+1} (that are not in 𝒢 t subscript 𝒢 𝑡 {\mathcal{G}}_{t} ) and an equal number of randomly sampled non-links.

𝑡 1 {\mathcal{G}}_{t+1} . From Table  2 ), we find that DySAT achieves consistent relative gains of 3–5% Macro-AUC over the best baselines on dynamic new link prediction as well, thus validating its effectiveness in accurately capturing temporal context for new link prediction.

Method Enron UCI Yelp ML-10M
Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC
node2vec 76.92 75.86 73.67 74.76 67.36 65.17 85.22 84.89
G-SAGE 73.92 74.67 76.69 79.41 62.25 58.81 85.23 89.14
G-SAGE + GAT 67.02 68.32 73.18 76.79 66.53 65.45 80.84 82.53
GCN-AE 74.46 74.02 74.76 76.75 66.18 65.77 82.45 82.48
GAT-AE 69.75 69.25 72.52 73.78 66.07 65.91 84.98 84.51
DynamicTriad 69.59 68.77 67.97 71.67 63.76 62.83 84.72 84.32
DynGEM 60.73 62.85 77.49 79.82 66.42 66.84 73.77 83.51
DynAERNN 59.05 59.63 77.72 81.91 74.33 73.46 87.42 88.19
DySAT 78.87 78.58 79.24 83.66 69.46 69.14 89.29 92.65

Appendix C Multi-Step Link Prediction

𝑡 Δ {\mathcal{G}}_{t+\Delta} and an equal number of randomly sampled pairs of unconnected nodes (non-links). We otherwise use the same evaluation setup of training a downstream logistic regression classifier to evaluate link prediction. In this experiment, we exclude the links formed by nodes that newly appear in the future evaluation snapshots, since most methods cannot be easily support updates for new nodes.

Figure.  5 depicts the variation in model performance of different methods over the 6 evaluation snapshots. As expected, we observe a slight decay in performance over time for all the models.  DySATachieves significant performance gains over all other baselines and maintains a highly stable link prediction performance over multiple future time steps. Static embedding methods often exhibit large variations in performance over time steps, while DySAT  achieves a stable consistent performance. The historical context captured by the dynamic node embeddings of DySAT, is one of the most likely reasons for its stable multi-step forecasting performance.

Refer to caption

Appendix D Impact of unseen nodes on Dynamic Link Prediction

In this section, we analyze the sensitivity of different graph representation learning on link prediction for previously unseen nodes that appear newly at time t 𝑡 t .

Refer to caption

𝑡 1 {\mathcal{G}}_{t+1} among the new nodes in 𝒢 t subscript 𝒢 𝑡 {\mathcal{G}}_{t} and corresponding randomly sampled non-links. Since the number of nodes varies significantly across different time steps, we report the performance of each method along with the number of new nodes at each time step, in Figure  6 .

From Figure  6 , we observe that DySAT outperforms other baselines in most datasets, demonstrating the ability to characterize new or previously unseen nodes despite their limited history. Although the temporal attention will focus on the latest representation of a new node v 𝑣 v due to absence of history, the structural embedding of v 𝑣 v recieves backpropagation signals through the temporal attention on neighboring nodes, which indirectly affects the final embedding of v 𝑣 v . We hypothesize that this indirect temporal signal is one of the reasons for DySAT to achieve performance improvements over baselines, albeit not designed to explicitly model historical context for previously unseen nodes.

Refer to caption

Appendix E Incremental Self-Attention Network

In this section, we describe an extension of our dynamic self-attentional architecture to learn incremental node representations. The motivation of incremental learning arises due to the proliferation in sizes of real-world graphs, making it difficult to store multiple snapshots in memory. Thus, the incremental graph representation learning problem imposes the restriction of no access to historical graph snapshots, in contrast to most dynamic graph embedding methods. Specifically, to learn the node embeddings { 𝒆 v t ∈ ℝ d ​ ∀ v ∈ 𝒱 } superscript subscript 𝒆 𝑣 𝑡 superscript ℝ 𝑑 for-all 𝑣 𝒱 \{{\bm{e}}_{v}^{t}\in{\mathbb{R}}^{d}\;\forall v\in{\mathcal{V}}\} at time T 𝑇 T , we require a model to only access to the snapshot 𝒢 T superscript 𝒢 𝑇 {\mathcal{G}}^{T} and a summary of the historical snapshots. For example, DynGEM  (Goyal et al., 2017 ) is an example of an incremental embedding method that uses the embeddings learned at step t − 1 𝑡 1 t-1 , as initialization to learn the embeddings at t 𝑡 t .

We propose an extension of our self-attentional architecture named IncSAT to explore solving the incremental graph representation learning problem. To learn node representations at T 𝑇 T , we first incrementally train multiple models at 1 ≤ t ≤ T 1 𝑡 𝑇 1\leq t\leq T . Unlike the original DySAT where structural self-attention is applied at each snapshot, IncSAT applies the structural block only at the latest graph snapshot 𝒢 t superscript 𝒢 𝑡 {\mathcal{G}}^{t} . We enable incremental learning by storing the intermediate output representations { 𝒉 v T ​ ∀ v ∈ 𝒱 } subscript superscript 𝒉 𝑇 𝑣 for-all 𝑣 𝒱 \{{\bm{h}}^{T}_{v}\;\forall v\in{\mathcal{V}}\} of the structural block. As illustrated in Figure  7 , these intermediate output representations of historical snapshots ( 1 ≤ t < T ) 1 𝑡 𝑇 (1\leq t<T) can be directly loaded from previously saved results at 1 ≤ t < T 1 𝑡 𝑇 1\leq t<T . Thus, the structural information of the previous historical snapshots are summarized in the stored intermediate representations. The temporal self-attention is only applied to the current snapshot 𝒢 T superscript 𝒢 𝑇 {\mathcal{G}}^{T} over the historical representations of each node to compute the final node embeddings { 𝒆 v T ​ ∀ v ∈ 𝒱 } subscript superscript 𝒆 𝑇 𝑣 for-all 𝑣 𝒱 \{{\bm{e}}^{T}_{v}\;\forall v\in{\mathcal{V}}\} at T 𝑇 T , which are trained on random walks sampled from 𝒢 T superscript 𝒢 𝑇 {\mathcal{G}}^{T} .

We evaluate IncSAT using the same experimental setup, with minor modifications in hyper-parameters. We use a dropout rate of 0.4 in both the structural and temporal self-attention layers. From our preliminary experiments, we find that a higher dropout rate in the structural block can facilitate avoiding over-fitting the model to the current graph snapshot. In Table  5 , we report the performance of IncSAT in comparison to DySAT and DynGEM, which is the only one that can support incremental training from our baseline models. The results show that IncSAT achieves comparable performance to DySAT on most datasets while significantly outperforming DynGEM, albeit with minimal hyper-parameter tuning.

Method Enron UCI Yelp ML-10M
Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC Micro-AUC Macro-AUC
DySAT 85.71 86.60 81.03 85.81 70.15 69.87 90.82 93.68
DynGEM 67.83 69.72 77.49 79.82 66.02 65.94 73.69 85.96
IncSAT 84.36 85.43 76.18 85.37 69.54 68.73 80.13 91.14

Appendix F Computational Complexity

𝒱 superscript 𝑇 2 𝐷 𝒱 𝑇 superscript 𝐷 2 superscript subscript 𝑡 1 𝑇 subscript ℰ 𝑡 𝐷 O(|{\mathcal{V}}|T^{2}D+|{\mathcal{V}}|TD^{2}+\sum\limits_{t=1}^{T}|{\mathcal{E}}_{t}|D) . The dominant term is O ​ ( | 𝒱 | ​ T 2 ​ D ) 𝑂 𝒱 superscript 𝑇 2 𝐷 O(|{\mathcal{V}}|T^{2}D) from the temporal self-attention layer, however its computation is fully parallelizable, thus amenable to GPU acceleration. In contrast, RNN-based methods (e.g., DynAERNN  Goyal et al. ( 2018 ) ) have a sequential dependency on previous steps, i.e. , computation at t 𝑡 t must wait from embeddings at t − 1 𝑡 1 t-1 , resulting in O ​ ( T ) 𝑂 𝑇 O(T) cost of sequential sequential operations. Empirically, we find DySAT to be around ten times faster than RNN-based methods with GPUs.

Appendix G Details on Hyper-parameter Settings and Tuning

In DySAT, the objective function (Eqn.  5 ) utilizes positive pairs of nodes co-occurring in fixed-length random walks. We follow the strategy of Deepwalk  (Perozzi et al., 2014 ) to sample walks 10 walks of length 40 per node, each with a context window size of 10. We use 10 negative samples per positive pair, with context distribution ( P n t subscript superscript 𝑃 𝑡 𝑛 P^{t}_{n} ) smoothing over node degrees with a smoothing parameter of 0.75, following  (Perozzi et al., 2014 ; Grover & Leskovec, 2016 ; Hamilton et al., 2017b ) . During training, we apply L 2 subscript 𝐿 2 L_{2} regularization with λ = 5 × 10 − 4 𝜆 5 superscript 10 4 \lambda=5\times 10^{-4} and use dropout rates  (Srivastava et al., 2014 ) of 0.1 and 0.5 in the self-attention layers of the structural and temporal blocks respectively. We use the validation set for tuning the learning rate in the range of { 10 − 4 , 10 − 3 } superscript 10 4 superscript 10 3 \{10^{-4},10^{-3}\} and negative sampling ratio w n subscript 𝑤 𝑛 w_{n} in the range { 0.01 , 0.1 , 1 } 0.01 0.1 1 \{0.01,0.1,1\} .

DynamicTriad  (Zhou et al., 2018 ) was tuned using their two key hyper-parameters determining the effect of smoothness and triadic closure, β 0 subscript 𝛽 0 \beta_{0} and β 1 subscript 𝛽 1 \beta_{1} in the range { 0.01 , 0.1 , 1 , 10 } 0.01 0.1 1 10 \{0.01,0.1,1,10\} , as advised, while using recommended settings otherwise. We use the L 1 subscript 𝐿 1 L_{1} operator ( | 𝒆 u t − 𝒆 u t | subscript superscript 𝒆 𝑡 𝑢 subscript superscript 𝒆 𝑡 𝑢 |{\bm{e}}^{t}_{u}-{\bm{e}}^{t}_{u}| ) instead of Hadamard, as recommended in the paper, which also gives better performance. For DynGEM, we tune the different scaling and regularization hyper-parameters, α ∈ { 10 − 6 , 10 − 5 } 𝛼 superscript 10 6 superscript 10 5 \alpha\in\{10^{-6},10^{-5}\} , β ∈ { 0.1 , 1 , 2 , 5 } 𝛽 0.1 1 2 5 \beta\in\{0.1,1,2,5\} , ν 1 ∈ { 10 − 6 , 10 − 4 } subscript 𝜈 1 superscript 10 6 superscript 10 4 \nu_{1}\in\{10^{-6},10^{-4}\} and ν 2 ∈ { 10 − 6 , 10 − 4 } subscript 𝜈 2 superscript 10 6 superscript 10 4 \nu_{2}\in\{10^{-6},10^{-4}\} , while using other default configurations. DynAERNN was tuned using similar scaling and regularization hyper-parameters, β ∈ { 0.1 , 1 , 2 , 5 } 𝛽 0.1 1 2 5 \beta\in\{0.1,1,2,5\} , ν 1 ∈ { 10 − 6 , 10 − 4 } subscript 𝜈 1 superscript 10 6 superscript 10 4 \nu_{1}\in\{10^{-6},10^{-4}\} and ν 2 ∈ { 10 − 6 , 10 − 4 } subscript 𝜈 2 superscript 10 6 superscript 10 4 \nu_{2}\in\{10^{-6},10^{-4}\} , while using similar layer configurations as DySAT to ensure direct comparability.

Appendix H Additional Dataset Details

In this section, we provide some additional, relevant dataset details. Since dynamic graphs often contain continuous timestamps, we split the data into multiple snapshots using suitable time-windows such each snapshot has an equitable yet reasonable number of interactions (communication/ratings). In each snapshot, the weight of a link is determined by the number of interactions between the corresponding pair of users during that time-period. The pre-processed versions of all datasets will be made publicly available, along with the scripts used for processing the raw data.

Communication Networks. The original un-processed Enron dataset is available at https://www.cs.cmu.edu/~./enron/ . We use only the email communcations that are between Enron employees, i.e. , sent by an Enron employee and have at least one recipient who is an Enron employee. A time-window of 2 months is used to construct 16 snapshots, where the first 5 are used as warm-up (due to sparsity) and the remaining 11 snapshots for evaluation.

The UCI dataset was downloaded from http://networkrepository.com/opsahl_ucsocial.php . This dataset contains private messages sent between users over a span of six months, on an online social network platform at the University of California, Irvine. The snapshots are created using their communication history with a time-window of 10 days. We discard/merge the terminal snapshots if they do not contain sufficient communications.

Rating Networks. We use the Round 11 version of the Yelp Dataset Challenge  https://www.yelp.com/dataset/challenge . To extract a cohesive subset of user-business ratings, we first select all businesses in the state of Arizona (the state with the largest number of ratings) with a selected set of restaurant categories. Further, we filter the data to retain only users and business which have at-least 15 ratings. Finally, we use a time-window of 6 months to extract 12 snapshots in the period of 2009 to 2015.

The ML-10m dynamic user-tag interaction network was downloaded from  http://networkrepository.com/ia-movielens-user2tags-10m.php . This dataset depicts the tagging behavior of MovieLens users, with the tags applied by a user on her rated movies. We use a time-window of 3 months to extract 13 snaphots over the course of 3 years.

ar5iv homepage

dynamic graph representation learning

Frequently Asked Questions

Representation Learning for Dynamic Graphs: A Survey

Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, Pascal Poupart ; 21(70):1−73, 2020.

Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance. Traditionally, machine learning models for graphs have been mostly designed for static graphs. However, many applications involve evolving graphs. This introduces important challenges for learning and inference since nodes, attributes, and edges change over time. In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. We also review several prominent applications and widely used datasets and highlight directions for future research.

2020. ( , )

Neural Temporal Walks: Motif-Aware Representation Learning on Continuous-Time Dynamic Graphs

Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track

Ming Jin, Yuan-Fang Li, Shirui Pan

Continuous-time dynamic graphs naturally abstract many real-world systems, such as social and transactional networks. While the research on continuous-time dynamic graph representation learning has made significant advances recently, neither graph topological properties nor temporal dependencies have been well-considered and explicitly modeled in capturing dynamic patterns. In this paper, we introduce a new approach, Neural Temporal Walks (NeurTWs), for representation learning on continuous-time dynamic graphs. By considering not only time constraints but also structural and tree traversal properties, our method conducts spatiotemporal-biased random walks to retrieve a set of representative motifs, enabling temporal nodes to be characterized effectively. With a component based on neural ordinary differential equations, the extracted motifs allow for irregularly-sampled temporal nodes to be embedded explicitly over multiple different interaction time intervals, enabling the effective capture of the underlying spatiotemporal dynamics. To enrich supervision signals, we further design a harder contrastive pretext task for model optimization. Our method demonstrates overwhelming superiority under both transductive and inductive settings on six real-world datasets.

Name Change Policy

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings.

Use the "Report an Issue" link to request a name change.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 21 August 2024

Random memristor-based dynamic graph CNN for efficient point cloud learning at the edge

  • Yifei Yu 1 , 2 ,
  • Shaocong Wang 1 , 2 ,
  • Meng Xu 1 , 2 ,
  • Woyu Zhang 3 , 4 ,
  • Bo Wang 1 , 2 ,
  • Jichang Yang 1 , 2 ,
  • Songqi Wang 1 , 2 ,
  • Yue Zhang 1 , 2 ,
  • Xiaoshan Wu 1 , 2 ,
  • Hegan Chen 1 , 2 ,
  • Dingchen Wang 1 , 2 ,
  • Xi Chen 1 , 2 ,
  • Ning Lin 1 , 2 ,
  • Xiaojuan Qi 1 ,
  • Dashan Shang 3 , 4 &
  • Zhongrui Wang 1 , 2 , 5  

npj Unconventional Computing volume  1 , Article number:  6 ( 2024 ) Cite this article

434 Accesses

Metrics details

  • Engineering
  • Mathematics and computing

The broad integration of 3D sensors into devices like smartphones and AR/VR headsets has led to a surge in 3D data, with point clouds becoming a mainstream representation method. Efficient real-time learning of point cloud data on edge devices is crucial for applications such as autonomous vehicles and embodied AI. Traditional machine learning models on digital processors face limitations, with software challenges like high training complexity, and hardware challenges such as large time and energy overheads due to von Neumann bottleneck. To address this, we propose a software-hardware co-designed random memristor-based dynamic graph CNN (RDGCNN). Software-wise, we transform point cloud into graph, and propose random EdgeConv for efficient hierarchical and topological features extraction. Hardware-wise, leveraging memristor’s intrinsic stochasticity and in-memory computing capability, we achieve significant reductions in training complexity and energy consumption. RDGCNN demonstrates high accuracy and efficiency across various point cloud tasks, paving the way for future edge 3D vision.

Similar content being viewed by others

dynamic graph representation learning

Adaptive deep learning-based neighborhood search method for point cloud

dynamic graph representation learning

A semi-automatic toolbox for markerless effective semantic feature extraction

dynamic graph representation learning

A road surface reconstruction dataset for autonomous driving

Introduction.

With the widespread adoption of 3D sensors in everyday devices, such as mixed reality headsets, automobiles, drones, and even smartphone cameras, there has been an explosive growth in the volume of 3D data. Point clouds 1 have emerged as a mainstream method for representing the shape and geometry of 3D objects through a series of discrete points (Fig. 1a ). The easy accessibility of point cloud data has significantly expanded its applications across a wide range of fields such as augmented and virtual reality (AR/VR), autonomous driving, robotics, and photography. Often operating at the edge, where computing resources and battery life are limited, these application scenarios underscore the importance of accurately, promptly, and efficiently learning of point cloud data.

figure 1

a Point cloud data, represented by a series of discrete points, is widely applied in everyday life with the popularization of 3D sensors in devices like mixed reality headset, automobiles, drones, and smartphone cameras. b Traditional machine learning approaches for handling point clouds: Left: Method of voxelizing point clouds and using 3D convolutional operations for processing; Right: Method of representing point clouds as point sets and directly processing them through neural networks. c The traditional von Neumann architecture of processors, with separate computation and storage units, introduces significant data transfer overhead. d We convert the point cloud into a graph representation, where each point serves as a vertex in the graph, and the edges are determined by the features of the points. e Our random dynamic graph CNN method. Left: The network dynamically updates the vertex features and connections, extracting hierarchical information. Right: The random EdgeConv operation. The central vertex features and neighboring vertex features are concatenated and passed through a random CNN to obtain edge features. The updated central node features are calculated by aggregating the edge features associated with edges emanating from all neighbouring vertices, and new graph connections are computed based on the updated features. f Our memristor-based Computing-In-Memory (CIM) system. Leveraging the in-memory computing capability of memristor, it reduces data transfer overhead and utilizes the intrinsic stochasticity to obtain a random conductance matrix for the physical implementation of random CNN weights.

However, conventional software-hardware pairs face significant challenges. In terms of software, so far there are multiple methods to learn 3D points (Fig. 1b ). The first method involves voxelizing the point cloud data. Voxelization converts the point cloud into a regular grid of 3D voxels, transforming the irregular point cloud into a structured format that can be processed using 3D convolutional operations 2 , 3 . However, voxelization can result in high memory consumption and computational cost, especially with fine-grained voxel grids. The second is to directly process point cloud data 4 , 5 by grouping and sampling within the point cloud to form a point set representation, which allows for the extraction of point cloud features without converting them into a different representation. While these approaches avoid the information loss associated with voxelization, they can still be expensive in training and struggle with capturing fine-grained geometric relationships between points.

In terms of hardware, conventional processors struggle to deliver the energy efficiency required for edge processing point cloud data (Fig. 1c ). Traditional von Neumann architectures feature physically separated processing and memory units. This separation leads to substantial data transfer overheads, known as the von Neumann bottleneck 6 . Moreover, due to the complexity of processing point cloud data, even high-performance GPUs struggle to meet the requirements for real-time processing 7 . This challenge is more pronounced in edge devices, where computational resources are significantly constrained.

To address the aforementioned challenges, we have co-designed software and hardware, random memristor-based dynamic graph CNN (RDGCNN).

Firstly, we transform point cloud data into graph data 8 , 9 , where points serve as vertices of the graph (Fig. 1d ). The edges are computed and dynamically updated based on vertex features, facilitating the flow of information among neighboring nodes. This transformation into a graph naturally accommodates the irregular structure of point clouds, eliminating the need for voxelization and thereby avoiding information loss. Moreover, graph data adeptly captures the geometric structural features of the 3D point cloud.

On the software front, we have developed a random dynamic graph CNN to learn point clouds (Fig. 1e ). Graph Neural Networks 10 (GNNs) update graph node feature representations through information propagation among nodes, capturing both local structures and global topologies within point data. To reduce training cost, we devised a random EdgeConv 9 to create a local neighborhood graph in feature space and perform convolutional operations on the connections between adjacent points. Unlike traditional GNNs, the edges in the point cloud graph dynamically update during embedding. Additionally, the weights of EdgeConv are fixed and random, avoiding the tedious training of conventional GNNs. Through iterative updates of vertex and edge embeddings via random EdgeConv, the feature representations obtained can serve various downstream tasks through a lightweight trainable task-dependent head.

At the hardware level, we leverage the in-memory computing capability of memristor and its intrinsic randomness to physically implement random EdgeConv layers (Fig. 1f ). As a non-volatile memory device 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , each memristor not only embodies the synaptic weight through analog conductance but also computes in-situ utilizing Ohm’s law and Kirchhoff’s current law for multiplication and accumulation (MAC) 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 . This approach significantly reduces the energy and time expenditures associated with traditional von Neumann architectures 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 . Furthermore, the uncertainty in ion migration during the electroforming process in memristor devices can cause variations in conductance values 37 , 38 , 39 . This intrinsic randomness often complicates the precise mapping of weights, deemed a disadvantage. However, we turn this into an advantage by utilizing the intrinsic randomness to realize truly random, large-scale and low-cost hardware random weights.

We implemented our co-design on a 40 nm memristor macro and validated the effectiveness of our approach on three canonical point cloud tasks: classification, part segmentation, and semantic segmentation. On the ModelNet40 40 , we achieved classification accuracy of 89.75%. On ShapeNet 41 , and S3DIS 42 datasets, we achieved mean Intersection over Union 43 (mIoU) of 83.67%, and 46.35%, respectively. Furthermore, the energy consumption of our system is reduced by 54.2%, 39.5%, and 35.3% compared to state-of-the-art (SOTA) GPUs. Moreover, the training complexity is reduced by 96.4%, 53.5%, and 68.1% compared to fully trainable baseline. Our co-design paves the way for future efficient and rapid point cloud applications at the edge.

Software-hardware co-design of RDGCNN

Our RDGCNN algorithm, as shown in Fig. 1d, e , first constructs a graph G  = ( V , E ) for the input point cloud data. Each point in the point cloud becomes a center vertex in the sub-graph, and edges are created to connect each point with its n nearest neighboring points determined by the relationships of the neighbouring points. We use the K-Nearest Neighbors (KNN) algorithm to construct the local graph structure. For each pair of adjacent points, the edge features are defined by a non-linear function h θ ( x i , x j − x i ), where x i represents the feature of the center point for capturing global information, and x j − x i represents the feature difference of the neighbor for extracting local geometric information. The function h θ is parameterized by a CNN with random weights, implemented by random analog memristors, to obtain the edge features e ij . After extracting features for the n nearest neighbors of the center point x i , the edge features are aggregated using a permutation-invariant max pooling function to obtain the updated center point feature x i ′ . After completing the feature update process, the new graph G ′  = ( V ′ , E ′ ) is obtained by recalculating the n -nearest neighbors based on the updated node features. This dynamic construction of G ( l ) = ( V ( l ), E ( l )) is iteratively performed at each layer l , gradually extracting higher-level features and gathering points in semantic space. For downstream tasks, the hierarchical features are concatenated and passed to lightweight trainable task-dependent heads in digital domain, enabling applications such as classification and segmentation.

Inspired by matrix decomposition methods such as Singular Value Decomposition 44 (SVD) and low-rank decomposition 45 , we split a random EdgeConv layer with a k ×k ×c filter size and d filters into two random sub-layers (Fig. 2a ). The first layer has d ′ filters with a k × k × c filter size, while the second layer has d filters with a 1 × 1 × d ′ filter size ( d ′ is much smaller than d ). The first layer compress n×c features to a low-rank intermediate feature subspace of dimension n×d ′ . The second layer then approximates the original output features of dimension n×d . This splitting approach reduces the parameter size to \(\frac{{d}^\prime\times{k}\times{k}\times{c}+{d}^\prime\times{d}}{{d}\times{k}\times{k}\times{c}}\) , and the computational complexity changes from O ( dk 2 c ) to O ( d ′ k 2 c ) +  O ( dd ′ ). By adjusting the subspace dimension d ′ , a tradeoff between parameter size and performance can be achieved.

figure 2

a Schematic of random EdgeConv splitting to reduce parameter count. We split one EdgeConv layer into two sub-layers. b Hardware implementation of in-memory matrix multiplication on memristor macro. Inputs are converted to voltages and fed into bit-lines, carrying vector-matrix multiplication via Ohm’s and Kirchhoff’s laws. Output currents from source lines are accumulated from memristor array and reference resistor output. c Optical photos of the memristor array and cross-sectional Transmission Electron Micrograph (TEM) of a single 1-transistor-1-memristor cell (scale bar: 200 nm and 50 nm). d Physical origin of intrinsic stochasticity in memristor. e Conductance map of memristor sub-array. f Histogram of the conductance. g Retention of memristors.

Figure 2b schematically illustrates the in-memory matrix multiplication realized on a memristor macro. The random weights are realized with the intrinsic stochasticiy provided by memristor array, where each memristor cell is subtracted from a fixed reference resistor (with a conductance of 30  µS ) to obtain positive and negative weights. The inputs are converted to voltages through digital-analogue convertors (DACs) and then fed into the bit-lines. The vector-matrix multiplication is achieved based on Ohm’s law and Kirchhoff’s law. The output currents from the memristors and the reference resistors are accumulated and digitized (see Supplementary Fig. 1 for memristor based hybrid analog-digital computing platform).

Figure 2c shows the chip photograph and transmission electron microscope (TEM) cross-sectional imaging of the memristor array, as well as nanoscale TaN/TaO x /Ta/TiN memristor devices integrated with complementary metal-oxide-semiconductor (CMOS) at the back-end-of-line process, fabricated in a test chip using the 40 nm technology node (see Supplementary Fig. 2 for memristor device characterization). The conductance of the memristor is controlled by the formation of conductive channels based on oxygen vacancies in the TaO x layer. Under the same electrical breakdown conditions, oxygen vacancies are generated in TaO x , which then migrate to form conductive channels and eventually transition to a high-conductance state. The heterogeneity in the position, shape, and vacancy concentration of the conductive channels across different devices leads to a random conductance (i.e., write noise) distribution within the memristor array (Fig. 2d ). Figure 2e shows the conductance map of a 360  ×  488 memristor sub-array. The resistance distribution follows a Gaussian distribution with a mean value around 30.9  µS and standard deviation around 4.9  µS (Fig. 2f ). Figure 2g demonstrates the retention of the memristor devices over 40,000 reading cycles, where the device conductance fluctuations (i.e., read noise) are relatively small compared to write noise.

3D point cloud classification

First, we validated the effectiveness of our co-design on point cloud classification tasks with ModelNet40 40 dataset. ModelNet40 is a widely used dataset for point cloud classification, consisting of 12,311 models from 40 different categories. Each model contains 1024 points, with each point having coordinate information ( x, y, z ). The system is expected to assign the entire point cloud to a specific category, such as recognizing it as a table, chair, laptop, motorbike, etc. Each point cloud model is first transformed into a graph with a selected number of nearest neighbors ( n  = 20). The model architecture is shown in Fig. 3a , where the graph first goes through 4 splitted random EdgeConv layers to extract features. Then, the hierarchical features are concatenated together through residual connections and further aggregated through a fusion layer. Finally, a lightweight trainable classification head is used for predicting the class (see Supplementary Fig. 3 for detailed network structure). During the model training process, the splitted random EdgeConv layers’ weights stay fixed. Only the weights of the classification head need to be updated, resulting in substantially reduced computational complexity compared to the baseline dynamic graph CNN (DGCNN) 9 model with fully trainable EdgeConv layers.

figure 3

a Schematic of RDGCNN for point cloud classification task on ModelNet40 dataset. b Experimental input and feature evolution of a chair model from the ModelNet40 dataset. c Weight distributions of different random EdgeConv layers. d Testset classification results and parameter count comparisons of trainable software DGCNN baseline and our co-design. e Confusion matrix of the classification results of our co-design on ModelNet40, with dominating diagonal elements. f Comparison of training complexity between the fully trainable DGCNN, DGCNN with splitted EdgeConv, and RDGCNN with splitted random EdgeConv (ours). g Comparison of inference energy between GPU, NPU, and our co-design.

Figure 3b illustrates the experimental input and feature evolution of the point cloud graph after random EdgeConv layers, as revealed by the color variations of different nodes in a sample selected from the chair category of ModelNet40. The color variations are calculated based on the distance to a specific node in the feature space. The node features are accumulated from dynamically generated graph neighboring edges, gradually extracting higher-level information such as topological structure. Finally, the classification head predicted the class label. Figure 3c illustrates the random EdgeConv weights following Gaussian distributions in different layers. The first layer does not undergo splitting, while the second to fourth layers employ splitted random EdgeConv. The testset classification results and parameter comparisons are shown in Fig. 3d . The baseline DGCNN model with fully trainable EdgeConv layers on the GPU achieves an accuracy of 91.45%. Meanwhile, our co-design achieves a classification accuracy of 89.75%, with only 1.7% decrease. In terms of trainable parameter count, our co-design exhibits a reduction of 92.5% compared to the trainable model (see Supplementary Fig. 6a for detailed parameter count comparison), making it highly favorable for edge hardware deployments with limited computing resources. Figure 3e presents the confusion matrix of the prediction results of our co-design on ModelNet40, showing dominant diagonal elements for most categories.

In addition to demonstrating its classification performance, we also analyzed the training costs and energy efficiency. We compared the fully trainable software baseline with trainable splitted EdgeConv and splitted random EdgeConv in terms of training costs (as shown in Fig. 3f ). Compared to the software baseline, splitted EdgeConv and splitted random EdgeConv reduced the training workload by 56.6% and 96.4%, respectively. We also compared the inference energy reduction of our co-design relative to the GPU (Nvidia A100) and NPU (Nvidia Jetson Nano). As is shown in Fig. 3g , compared to the GPU and NPU, our co-design reduces the inference energy consumption by 54.2% and 37.3%, respectively, requiring only 9.51 mJ per sample (see Supplementary Table. 1 for detailed energy breakdown).

3D point cloud part segmentation

We extended our co-design approach to the part segmentation task on the ShapeNet 41 part dataset. Compared to classification, part segmentation is a more fine-grained task that involves segmenting different parts of objects, commonly used in fields such as robot grasping and object modeling. For example, given a model of an airplane, the system is expected to identify key components such as the fuselage, wings, tail, and engines. The ShapeNet dataset consists of 16,881 3D shapes from 16 object categories, annotated with a total of 50 parts. As is shown in Fig. 4a , the point cloud data is first processed by a spatial transformation network, followed by three splitted random EdgeConv modules for feature extraction. The hierarchical features are concatenated through residual connections and passed through a fusion layer for aggregation. Subsequently, they are fed into a lightweight segmentation head for point-wise classifications (see Supplementary Fig. 4 for detailed network structure). Figure 4b presents the experimental results showing the feature evolution of points in an airplane model from the ShapeNet dataset. The color of each point represents the distance to the red point in the feature space. It can be observed that in deeper layer feature space, even if there is a significant distance between them in the original input space, semantically similar structures can be captured. For example, the wings gradually converge to the same color in the figure. We measure the performance of segmentation through mIoU and compare it with the trainable software baseline. Our co-design achieved a mIoU of 83.67%, with only an 1.90% decrease (Fig. 4c ). In most categories, our co-design can achieve performance similar to that of the software. In specific categories such as motorbike, due to the imbalance in sample numbers between categories and the analogue computing noise, our co-design shows a more noticeable decline compared to the software baseline (see Supplementary Fig. 7a for accuracy metrics). Compared to the baseline, our co-design has a 89.8% reduction in the number of parameters (Fig. 4d , see Supplementary Fig. 6b for detailed parameter count comparison). Figure 4e visualizes randomly selected part segmentation results using our co-design on ShapeNet. Different colors represent different parts.

figure 4

a Schematic of RDGCNN for the part segmentation task on the ShapeNet dataset. b Experimental feature evolution of points in an airplane model from the ShapeNet dataset. c Comparison of part segmentation results of software trainable DGCNN baseline and our co-design. d Parameter count comparisons of trainable software DGCNN baseline and our co-design. e Representative segmentation results of our co-design. f Comparison of training costs between the fully trainable DGCNN, DGCNN with splitted EdgeConv, and RDGCNN with splitted random EdgeConv (ours). g Comparison of inference energy between GPU, NPU, and our co-design.

We also analyzed the training cost and energy efficiency for the part segmentation task. We compared the fully trainable DGCNN software baseline with splitted trainable EdgeConv and RDGCNN in terms of training complexity (as shown in Fig. 4f ). Compared to the fully trainable DGCNN, splitted EdgeConv and RDGCNN reduced the training cost by 36.8% and 53.5%, respectively. We also compared the inference energy reduction of our co-design relative to the GPU (Nvidia A100) and NPU (Nvidia Jetson Nano). As is shown in Fig. 4g , compared to the GPU and NPU, our co-design reduces the inference energy consumption by 39.5% and 28.7%, respectively, requiring only 64.12 mJ per sample (see Supplementary Table. 1 for detailed energy breakdown).

3D point cloud semantic segmentation

We further validated the effectiveness of our co-design on a more complex semantic segmentation task. Unlike part segmentation, which focuses on parts within a single object, semantic segmentation aims to understand the scene at a higher level by identifying and labeling entire objects. It is pivotal for applications that require scene understanding and contextual awareness, such as autonomous navigation and environmental mapping. We evaluated our co-design’s efficacy on the Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) 42 for the semantic segmentation task. This dataset consists of 3D scanned point clouds from six indoor areas, totaling 272 rooms. Each point belongs to one of thirteen semantic categories, such as floor, bookshelf, chair, ceiling, beam, and clutter. The network architecture used for this task is similar to the part segmentation model, with the difference being that the model’s output is a probability distribution for the semantic object category for each input point, rather than a category vector. As is shown in Fig. 5a , the point cloud data undergoes feature extraction through three splitted random EdgeConv modules, followed by aggregation of hierarchical features. The aggregated features then enter the lightweight trainable segmentation head to output the probability distribution of point categories (see Supplementary Fig. 5 for detailed network structure).

figure 5

a Schematic of RDGCNN for the semantic segmentation task on S3DIS dataset. b Comparison of semantic segmentation results of each test area in a 6-fold cross-validation between software trainable baseline and our co-design. c , Parameter comparison of trainable software baseline and our co-design. d , Representative semantic segmentation results of our co-design. e , Comparison of training costs between the fully trainable DGCNN, DGCNN with splitted EdgeConv, and RDGCNN with splitted random EdgeConv (ours). f , Comparison of inference energy between GPU, NPU, and our co-design.

To evaluate the performance of our co-design, we use 6-fold cross-validation. As shown in Fig. 5b , compared to the trainable DGCNN baseline, our co-design achieved an overall mIoU of 46.35%, with a 12.73% decrease (see Supplementary Fig. 7b for accuracy metrics). In terms of the number of parameters, our co-design saw a reduction of 89.8% compared to the fully trainable baseline (Fig. 5c , see Supplementary Fig. 6c for detailed parameter count comparison). Figure 5d visualizes the semantic segmentation results of our co-design on S3DIS dataset, where different colors represent different semantic regions. We also analyzed the training overhead and energy efficiency on the semantic segmentation task. We compared the training costs of a fully trainable DGCNN, a DGCNN with trainable splitted EdgeConv, and our RDGCNN (as shown in Fig. 5e ). Compared to the software baseline, DGCNN with trainable splitted EdgeConv and RDGCNN reduced the training complexity by 46.9% and 68.1% respectively. We also compared the inference energy reduction of our co-design relative to a GPU (Nvidia A100) and an NPU (Nvidia Jetson Nano). As is shown in Fig. 5f , compared to the GPU and NPU, our co-design reduced inference energy consumption by 35.3% and 21.9% respectively, requiring only 53.75 mJ per sample (see Supplementary Table. 1 for detailed energy breakdown).

Noise robustness analysis on conductance fluctuation

We further demonstrate the robustness of our co-design to read noise in memristors. Memristor primarily has two types of noise: write noise due to programming stochasticity and read noise due to fluctuation in charge transport. In our co-design, we utilize the former as a source for random weights to physically implement random EdgeConv. The latter, caused by thermal fluctuations or Random Telegraph Noise (RTN), leads to stochasticity in charge movement, resulting in slight changes in conductance over time (Fig. 6a ). Figure 6b shows the conductance fluctuations of memristor after 10,000 reads. As the number of reads increases, the fluctuations slightly increase, but the overall fluctuation remains around 1.5% (the amplitude is determined by the standard deviation divided by the mean). We simulated various levels of read noise (Fig. 6c ) from 1.5% to 9%, and modeled the impact of this noise on network performance across three tasks. Figure 6d, e, f respectively show the impact of different levels of noise on performance in point cloud classification, part segmentation, and semantic segmentation tasks. As the level of disturbance increases, the performance of the network gradually decreases. In the classification task, the model maintains a relatively good classification accuracy (over 80%) when the disturbance amplitude is within 4.5%. In the part segmentation task, the model is more robust to noise, still achieving an overall mIoU of about 80% even under a 6% fluctuation amplitude. For semantic segmentation, the model is more sensitive to read noise compared with classification and part segmentation, attaining in less than 20% mIoU when noise level exceeds 4.5% (see Supplementary Fig. 8 for more results with smaller noise fluctuation levels).

figure 6

a Physical origin of conductance fluctuation in memristor. b Experimental read noise of memristor over 10,000 cycles. c , Illustration of simulated conductance fluctuation of different levels. d Classification accuracy on ModelNet40 under various conductance fluctuation levels. e Part segmentation performance on ShapeNet under various conductance fluctuation levels. f Semantic segmentation performance on S3DIS under various conductance fluctuation levels.

In this research, we introduce a novel hardware-software co-designed system using RDGCNN and memristor for efficient and affordable learning of point cloud data, suited for edge applications like mixed reality, autonomous vehicles, and embodied AI. In this work, we implemented RDGCNN on a 40 nm fully integrated memristor array, which performed point cloud classification, part segmentation, and semantic segmentation tasks. Compared to state-of-the-art digital hardware, our co-design delivers energy consumption (training complexity) reduction of 54.2%, 39.5%, and 35.3% (96.4%, 53.5%, and 68.1%) on these three representative tasks, while achieving high classification accuracy and segmentation mIoU comparable to the software. Our co-design not only reduces the energy consumption substantially but also minimizes the training overhead, making it significantly more efficient and economical for real-time, high-performance edge applications. This innovative approach leverages the inherent randomness of memristor to enhance processing capabilities and paves the way for future edge computing in handling complex 3D datasets.

Fabrication of random memristor chips

The memristor array was fabricated using 40 nm technology and features a 1T1R configuration. Each cell is located between the metal 4 and metal 5 layers in the backend-of-line process, with a structure comprising a bottom electrode (BE), a top electrode (TE), and a transition-metal oxide dielectric layer. The BE via was created using photolithography and etching techniques, filled with TaN through physical vapor deposition, and topped with a 10 nm TaN buffer layer. Subsequently, a 5 nm Ta layer was deposited and then oxidized to form an 8 nm TaO x dielectric layer. Finally, a 3 nm Ta layer and a 40 nm TiN layer were sequentially added via physical vapor deposition to construct the TE. The remaining interconnection metals were added using a standard logic process. BE connections were shared by cells in the same row, while TE connections were shared by cells in the same column, forming a 512 × 512 crossbar array. The 40 nm memristor chip exhibited high yield and strong endurance after being post-annealed at 400 ◦ C for 30 minutes in a vacuum.

The hybrid analog-digital computing system

The hybrid analog-digital computing system includes a 40 nm memristor computing-in-memory chip and a Xilinx ZYNQ system-on-chip (SoC) mounted on a printed circuit board (PCB). This setup delivers parallel 64-way analog voltage inputs, produced by an 8-channel digital-to-analog converter (DAC80508, TEXAS INSTRUMENTS) with 16-bit resolution, covering a range from 0 V to 5 V. For signal collection, the convergence current is converted to voltages using trans-impedance amplifiers (OPA4322-Q1, TEXAS INSTRUMENTS) and read with a 14-bit resolution analog-to-digital converter (ADS8324, TEXAS INSTRUMENTS). The system integrates both analog and digital conversions onboard. During vector-matrix multiplications, a DC voltage is applied to the bit lines of the RRAM chip via a 4-channel analog multiplexer (CD4051B, TEXAS INSTRUMENTS) and controlled by an 8-bit shift register (SN74HC595, TEXAS INSTRUMENTS). The result current from the source line is converted to voltages and transferred to the Xilinx SoC for further processing.

Training details of classification on ModelNet40

The detailed network architecture used for the classification task is shown in Supplementary Fig. 3 . In our model, we employ four random EdgeConv layers to capture geometric features, each supported by three splitted convolutional layers with dimensions of 32, 32, 64, and 128, with rank of 12. Following the computation in each EdgeConv layer, we update the graph using the new features to guide the next layer. For all EdgeConv layers, we set the nearest neighbor count, k, to 20. Multi-scale features are harvested through shortcut connections, and an additional splitted random 1D convolutional layer with a size of 1024 and rank of 12 is used to integrate these features into a 512-dimensional point cloud by merging outputs from earlier layers. Global features of the point cloud are extracted via global max/sum pooling and then transformed through two subsequent fully-connected layers sized 512 and 256. The last two layers include dropout at a 50% keep rate and feature LeakyReLU activation and batch normalization. We employ Stochastic Gradient Descent (SGD) with an initial learning rate of 0.1, gradually decreasing it to 0.001 through cosine annealing. The momentum setting for batch normalization is maintained at 0.9. Our batch size is set at 32, with the momentum at 0.9. We train the model for 250 epochs.

Training details of part segmentation on ShapeNet

The detailed network structure is depicted in Supplementary Fig. 4 . Following a spatial transformer network, three EdgeConv modules are implemented. The first module include a splitted convolutional layer with dimension of 64 and rank of 12. The following two modules each consist of two splitted convolutional layers with dimension of 64 and rank of 12. Information from these layers is concatenated using a splitted random 1D convolutional layer withs dimension of 1024 and rank of 12. The neighbour count k is set to 40. Shortcut connections integrate outputs from all EdgeConv layers as local feature descriptors. Subsequently, the pointwise features are processed through three shared fully-connected layers with dimensions of 256, 256, and 128. Similar to our classification network, batch normalization, dropout, and ReLU are incorporated throughout this configuration. We adopt the same training setting as the classification task. We train the model for 200 epochs.

Training details of semantic segmentation on S3DIS

The network structure used for semantic segmentation is similar to classification model, as is shown in Supplementary Fig. 5 . Three EdgeConv modules are used. The first module include a splitted convolutional layer with dimension of 64 and rank of 12. The following two modules each consist of two splitted convolutional layers with dimension of 64 and rank of 12. Information from these layers is concatenated using a splitted random 1D convolutional layer with dimension of 1024 and rank of 12. The neightbour count k is set to 20. Shortcut connections integrate outputs from all EdgeConv layers as local feature descriptors. Subsequently, the pointwise features are processed through a shared fully-connected layers with dimensions of 256. Similar to our part segmentation network, batch normalization, dropout, and ReLU are incorporated throughout this configuration. We adopt the same training setting as the classification task. We train the model for 100 epochs for each test area.

Data availability

The ModelNet40 dataset, ShapeNet dataset, and S3DIS dataset are publicly available. All other measured data are available from the corresponding author upon reasonable request.

Code availability

The code used in this study is available from the corresponding author upon reasonable request.

Rusu, R. B. & Cousins, S. 3d is here: Point cloud library (pcl). In 2011 IEEE international conference on robotics and automation , 1–4 (IEEE, 2011).

Zhou, Y. & Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition , 4490–4499 (2018).

Maturana, D. & Scherer, S. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) , 922–928 (IEEE, 2015).

Qi, C. R., Su, H., Mo, K. & Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition , 652–660 (2017).

Qi, C. R., Yi, L., Su, H. & Guibas, L. J. Pointnet + +: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems , 5105-5114 (2017).

Zidan, M. A., Strachan, J. P. & Lu, W. D. The future of electronics based on memristive systems. Nat. electronics 1 , 22–29 (2018).

Article   Google Scholar  

Lin, Y., Zhang, Z., Tang, H., Wang, H. & Han, S. Pointacc: Efficient point cloud accelerator. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture , 449–461 (2021).

Shi, W. & Rajkumar, R. Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 1711–1719 (2020).

Wang, Y. et al. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graph. (tog) 38 , 1–12 (2019).

Google Scholar  

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Transac. Neural Netw. 20 , 61–80 (2008).

Wang, Z. et al. Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing. Nat. materials 16 , 101–108 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Kumar, S., Wang, X., Strachan, J. P., Yang, Y. & Lu, W. D. Dynamical memristors for higher-complexity neuromorphic computing. Nat. Rev. Mater. 7 , 575–591 (2022).

Article   ADS   Google Scholar  

Liu, K. et al. An optoelectronic synapse based on α -in2se3 with controllable temporal dynamics for multimode and multiscale reservoir computing. Nat. Electron. 5 , 761–773 (2022).

Article   CAS   Google Scholar  

Zhang, H.-T. et al. Reconfigurable perovskite nickelate electronics for artificial intelligence. Science 375 , 533–539 (2022).

Najem, J. S. et al. Dynamical nonlinear memory capacitance in biomimetic membranes. Nat. Commun. 10 , 3239 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Najem, J. S. et al. Memristive ion channel-doped biomembranes as synaptic mimics. ACS nano 12 , 4702–4711 (2018).

Article   CAS   PubMed   Google Scholar  

Yang, L. et al. Self-selective memristor-enabled in-memory search for highly efficient data mining. InfoMat 5 , e12416 (2023).

Zhou, Y. et al. Computational event-driven vision sensors for in-sensor spiking neural networks. Nat. Electron. 6 , 870–878 (2023).

Chen, J. et al. Optoelectronic graded neurons for bioinspired in-sensor motion perception. Nat. Nanotechnol. 18 , 882–888 (2023).

Duan, Q. et al. Spiking neurons with spatiotemporal dynamics and gain modulation for monolithically integrated memristive neural networks. Nat. Commun. 11 , 3399 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Wang, Z. et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron. 1 , 137–145 (2018).

Ielmini, D. & Wong, H.-S. P. In-memory computing with resistive switching devices. Nat. electronics 1 , 333–343 (2018).

Yang, H. et al. Mixed-precision partial differential equation solver design based on nonvolatile memory. IEEE Transactions on Electron Devices 69 , 3708–3715 (2022).

Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558 , 60–67 (2018).

Sun, Z., Pedretti, G., Bricalli, A. & Ielmini, D. One-step regression and classification with cross-point resistive memory arrays. Sci. advances 6 , eaay2378 (2020).

Article   ADS   CAS   Google Scholar  

Wang, S. et al. In-memory analog solution of compressed sensing recovery in one step. Sci. Adv 9 , eadj2908 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Li, J. et al. Sparse matrix multiplication in a record-low power self-rectifying memristor array for scientific computing. Sci. Adv. 9 , eadf7474 (2023).

Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577 , 641–646 (2020).

Yuan, R. et al. A neuromorphic physiological signal processing system based on vo2 memristor for next-generation human-machine interface. Nat. Commun. 14 , 3695 (2023).

Moon, J. et al. Temporal data classification and forecasting using a memristor-based reservoir computing system. Nat. Electron. 2 , 480–487 (2019).

Zhang, W. et al. Edge learning using a fully integrated neuro-inspired memristor chip. Science 381 , 1205–1211 (2023).

Sun, Z. et al. A full spectrum of computing-in-memory technologies. Nat. Electron. 6 , 823–835 (2023).

Wen, T.-H. et al. Fusion of memristor and digital compute-in-memory processing for energy-efficient edge computing. Science 384 , 325–332 (2024).

Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521 , 61–64 (2015).

Song, L., Qian, X., Li, H. & Chen, Y. Pipelayer: A pipelined reram-based accelerator for deep learning. In 2017 IEEE international symposium on high performance computer architecture (HPCA) , 541–552 (IEEE, 2017).

Huang, Y. et al. Memristor-based hardware accelerators for artificial intelligence. Nat. Rev. Electr. Eng . 1 , 286–299 (2024).

Cai, F. et al. Power-efficient combinatorial optimization using intrinsic noise in memristor hopfield neural networks. Nat. Electron. 3 , 409–418 (2020).

Wang, S. et al. Echo state graph neural networks with analogue random resistive memory arrays. Nat. Mach. Intell. 5 , 104–113 (2023).

Wang, S. et al. Convolutional echo-state network with random memristors for spatiotemporal signal classification. Adv. Intell. Syst. 4 , 2200027 (2022).

Wu, Z. et al. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition , 1912–1920 (2015).

Chang, A. X. et al. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).

Armeni, I. et al. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition , 1534–1543 (2016).

Rezatofighi, H. et al. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 658–666 (2019).

Tai, C. et al. Convolutional neural networks with low-rank regularization. In 4th International Conference on Learning Representations, ICLR 2016 (2016).

Zhang, X., Zou, J., Ming, X., He, K. & Sun, J. Efficient and accurate approximations of nonlinear convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and pattern Recognition , 1984–1992 (2015).

Download references

Acknowledgements

This research is supported by the National Key R&D Program of China (Grant No. 2022YFB3608300), the National Natural Science Foundation of China (Grant Nos. 62122004, 62374181), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB44000000), Beijing Natural Science Foundation (Grant No. Z210006), Hong Kong Research Grant Council (Grant Nos. 27206321, 17205922, 17212923). This research is also partially supported by ACCESS – AI Chip Center for Emerging Smart Systems, sponsored by Innovation and Technology Fund (ITF), Hong Kong SAR.

Author information

Authors and affiliations.

Department of Electrical and Electronic Engineering, the University of Hong Kong, Hong Kong, China

Yifei Yu, Shaocong Wang, Meng Xu, Bo Wang, Jichang Yang, Songqi Wang, Yue Zhang, Xiaoshan Wu, Hegan Chen, Dingchen Wang, Xi Chen, Ning Lin, Xiaojuan Qi & Zhongrui Wang

ACCESS – AI Chip Center for Emerging Smart Systems, InnoHK Centers, Hong Kong Science Park, Hong Kong, China

Yifei Yu, Shaocong Wang, Meng Xu, Bo Wang, Jichang Yang, Songqi Wang, Yue Zhang, Xiaoshan Wu, Hegan Chen, Dingchen Wang, Xi Chen, Ning Lin & Zhongrui Wang

Key Lab of Fabrication Technologies for Integrated Circuits, Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China

Woyu Zhang & Dashan Shang

University of Chinese Academy of Sciences, Beijing, 100049, China

School of Microelectronics, Southern University of Science and Technology, Shenzhen, 518055, China

Zhongrui Wang

You can also search for this author in PubMed   Google Scholar

Contributions

Z.W. and Y.Y. conceived the work. Z.W., Y.Y., S.C.W., W.Z. contributed to the design and development of the models, software and the hardware experiments. Z.W., Y.Y., S.C.W., M.X., J.Y., and B.W. interpreted, analysed and presented the experimental results. Z.W., D.S., and Y.Y. wrote the manuscript. All authors discussed the results and implications and commented on the manuscript at all stages.

Corresponding authors

Correspondence to Dashan Shang or Zhongrui Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yu, Y., Wang, S., Xu, M. et al. Random memristor-based dynamic graph CNN for efficient point cloud learning at the edge. npj Unconv. Comput. 1 , 6 (2024). https://doi.org/10.1038/s44335-024-00006-0

Download citation

Received : 21 May 2024

Accepted : 07 August 2024

Published : 21 August 2024

DOI : https://doi.org/10.1038/s44335-024-00006-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

dynamic graph representation learning

  • DOI: 10.3390/fi16080295
  • Corpus ID: 271903907

Dynamic Graph Representation Learning for Passenger Behavior Prediction

  • Mingxuan Xie , Tao Zou , +2 authors Runhe Huang
  • Published 17 August 2024
  • Computer Science, Engineering

Figures and Tables from this paper

figure 1

35 References

From link prediction to forecasting: information loss in batch-based temporal graph learning, towards better dynamic graph learning: new architecture and unified library, do we really need complicated model architectures for temporal networks, dyted: disentangled representation learning for discrete-time dynamic graph, roland: graph learning framework for dynamic graphs, towards better evaluation for dynamic link prediction, short-term trajectory prediction for individual metro passengers integrating diverse mobility patterns with adaptive location-awareness, tcl: transformer-based dynamic graph modelling via contrastive learning, mlp-mixer: an all-mlp architecture for vision, a survey on embedding dynamic graphs, related papers.

Showing 1 through 3 of 0 Related Papers

SLADE: Detecting Dynamic Anomalies in Edge Streams without Labels via Self-Supervised Learning

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, supplemental material, index terms.

Information systems

Information systems applications

Data mining

World Wide Web

Web applications

Social networks

Recommendations

A semantically driven self-supervised algorithm for detecting anomalies in image sets.

Humans can readily detect when an image does not belong in a set by comparing semantic information between images to derive meaning and relationships from the colors, shapes, sizes, locations, and textures within them. Current self-supervised ...

  • Self-supervised algorithm to learn and model semantic representations of image sets.
  • Method to allow anomaly detection models to learn targeted invariances to image sets.
  • Self-supervised anomaly detection algorithm requiring only ...

Self-supervised anomaly detection in computer vision and beyond: A survey and outlook

Anomaly detection (AD) plays a crucial role in various domains, including cybersecurity, finance, and healthcare, by identifying patterns or events that deviate from normal behavior. In recent years, significant progress has been made in this ...

  • We present a cohesive overview of self-supervised methods for anomaly detection.
  • We categorize existing self-supervised anomaly detection algorithms. their proxy tasks.
  • We conduct a comprehensive performance comparison.

Task-Oriented Self-supervised Learning for Anomaly Detection in Electroencephalography

Accurate automated analysis of electroencephalography (EEG) would largely help clinicians effectively monitor and diagnose patients with various brain diseases. Compared to supervised learning with labelled disease EEG data which can train a model ...

Information

Published in.

cover image ACM Conferences

  • General Chairs:

Northeastern University, USA

CENTAI / Eurecat, Italy

  • SIGMOD: ACM Special Interest Group on Management of Data
  • SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • anomaly detection
  • edge stream
  • self-supervised learning
  • Research-article

Funding Sources

  • Institute of Information & Communications Technology Planning & Evaluation

Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

View or Download as a PDF file.

View online with eReader .

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

IMAGES

  1. Graph Representation Learning

    dynamic graph representation learning

  2. Dynamic Graph Representation Learning via Self-Attention Networks

    dynamic graph representation learning

  3. Illustration of graph representation learning input and output

    dynamic graph representation learning

  4. graph-representation-learning · GitHub Topics · GitHub

    dynamic graph representation learning

  5. Dynamic Graph Neural Networks

    dynamic graph representation learning

  6. Graph Structure Learning for Robust Graph Neural Networks

    dynamic graph representation learning

COMMENTS

  1. Dynamic Graph Representation Learning with Neural Networks: A Survey

    View a PDF of the paper titled Dynamic Graph Representation Learning with Neural Networks: A Survey, by Leshanshui Yang and 2 other authors. In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to integrate both topological and temporal information in a compact ...

  2. Dynamic Graph Representation Learning With Neural Networks: A Survey

    In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to integrate both topological and temporal information in a compact representation. Dynamic graphs efficiently handle applications such as social network prediction, recommender systems, traffic forecasting, or electroencephalography analysis, which cannot be ...

  3. PDF Representation Learning for Dynamic Graphs: A Survey

    This paper reviews recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. It categorizes and analyzes techniques that encode and decode temporal aspects of graphs into embeddings and predictions.

  4. Dynamic Graph Representation Learning via Self-Attention Networks

    Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time.

  5. TemporalGAT: Attention-Based Dynamic Graph Representation Learning

    Learning representations for dynamic graphs is fundamental as it supports numerous graph analytic tasks such as dynamic link prediction, node classification, and visualization. Real-world dynamic graphs are continuously evolved where new nodes and edges are introduced or removed during graph evolution. Most existing dynamic graph representation ...

  6. Dynamic Graph Representation Learning Via Graph

    Dynamic graph representation learning is an important task with widespread ap-plications. Previous methods on dynamic graph learning are usually sensitive to noisy graph information such as missing or spurious connections, which can yield degenerated performance and generalization. To overcome this challenge,

  7. Dyrep: Learning Representations over Dynamic Graphs

    Representation Learning over graph structured data has received significant attention recently due to its ubiquitous applicability. However, most advancements have been made in static graph settings while efforts for jointly learning dynamic of the graph and dynamic on the graph are still in an infant stage.

  8. TemporalGAT: Attention-Based Dynamic Graph Representation Learning

    Most existing dynamic graph representation learning methods focus on modeling dynamic graphs with fixed nodes due to the complexity of modeling dynamic graphs, and therefore, cannot efficiently learn the evolutionary patterns of real-world evolving graphs. Moreover, existing methods generally model the structural information of evolving graphs ...

  9. Dynamic Graph Representation Learning via Self-Attention Networks

    Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network ...

  10. A dynamic graph representation learning based on temporal graph

    In this paper, we propose a novel graph neural network framework, called a temporal graph transformer (TGT), that learns dynamic node representation from a sequence of time-stamped events. TGT consists of three main modules, i.e., an update module, an aggregation module, and a propagation module.

  11. Dynamic Graph Representation Learning via Self-Attention Networks

    Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time.

  12. PDF DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self

    We formally define the problem of dynamic graph representation learning. A dynamic graph is defined as a series of observed static graph snapshots, G = {G1,...,GT}whereT is the number of time steps. Each snapshot Gt = (V,Et)is a weighted undirected graph with a shared node set V, a link set Et, and a weighted adjacency matrixAt at time stept ...

  13. Multi-relational dynamic graph representation learning

    Depending on the treatment of time, dynamic graph representation learning methods can be broadly classified into two categories. One is the discrete dynamic graph neural network method, which treats the dynamic graph as a series of graph snapshots. The structural information of each graph snapshot can first be encoded using static graph neural ...

  14. RDGSL: Dynamic Graph Representation Learning with Structure Learning

    Temporal Graph Networks (TGNs) have shown remarkable performance in learning representation for continuous-time dynamic graphs. However, real-world dynamic graphs typically contain diverse and intricate noise. Noise can significantly degrade the quality of representation generation, impeding the effectiveness of TGNs in downstream tasks. Though structure learning is widely applied to mitigate ...

  15. RDGSL: Dynamic Graph Representation Learning with Structure Learning

    Temporal Graph Networks (TGNs) have shown remarkable performance in learning representation for continuous-time dynamic graphs. However, real-world dynamic graphs typically contain diverse and intricate noise. Noise can significantly degrade the quality of representation generation, impeding the effectiveness of TGNs in downstream tasks.

  16. Representation learning for dynamic graphs: a survey

    In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. ...

  17. Representation Learning for Dynamic Graphs: A Survey

    In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. We also review several ...

  18. dyngraph2vec: Capturing network dynamics using dynamic graph

    Learning graph representations is a fundamental task aimed at capturing various properties of graphs in vector space. The most recent methods learn such representations for static networks. However, real-world networks evolve over time and have varying dynamics. Capturing such evolution is key to predicting the properties of unseen networks.

  19. Representation Learning for Dynamic Graphs: A Survey

    This introduces important challenges for learning and inference since nodes, attributes, and edges change over time. In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and ...

  20. [PDF] Dynamic Graph Representation Learning With Neural Networks: A

    A review of the problems and models related to dynamic graph learning, including guidelines for DGNN design and optimization, and review public datasets for evaluating model performance on various tasks, along with the corresponding publications are provided. In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to ...

  21. Neural Temporal Walks: Motif-Aware Representation Learning on

    Continuous-time dynamic graphs naturally abstract many real-world systems, such as social and transactional networks. While the research on continuous-time dynamic graph representation learning has made significant advances recently, neither graph topological properties nor temporal dependencies have been well-considered and explicitly modeled ...

  22. [2112.08733] Self-Supervised Dynamic Graph Representation Learning via

    Self-supervised learning on graphs has recently drawn a lot of attention due to its independence from labels and its robustness in representation. Current studies on this topic mainly use static information such as graph structures but cannot well capture dynamic information such as timestamps of edges. Realistic graphs are often dynamic, which means the interaction between nodes occurs at a ...

  23. Random memristor-based dynamic graph CNN for efficient point cloud

    d We convert the point cloud into a graph representation, ... Wang, Y. et al. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graph. (tog) 38, 1-12 (2019).

  24. Self-supervised Representation Learning on Dynamic Graphs

    Graph representation learning has now become the de facto standard when dealing with graph-structured data. Using powerful tools from deep learning and graph neural networks, recent works have applied graph representation learning to time-evolving dynamic graphs and showed promising results.

  25. [PDF] Dynamic Graph Representation Learning for Passenger Behavior

    A novel graph neural network approach, called TCL, is proposed, which deals with the dynamically-evolving graph in a continuous-time fashion and enables effective dynamic node representation learning that captures both the temporal and topology information.

  26. PDF A arXiv:1812.09430v2 [cs.LG] 15 Jun 2019

    As dynamic graphs usually contain periodical patterns, we extend the self-attention mechanisms over the historical representations of a particular node to capture its temporal evolution behaviors. 3 PROBLEM DEFINITION In this work, we address the problem of dynamic graph representation learning. A dynamic graph

  27. SLADE: Detecting Dynamic Anomalies in Edge Streams without Labels via

    Representation learning for dynamic graphs: A survey. Journal of Machine Learning Research, Vol. 21, 1 (2020), 2648--2720. Digital Library. Google Scholar [16] Shima Khoshraftar, Aijun An, and Nastaran Babanejad. 2022. Temporal graph representation learning via maximal cliques. In Big Data.

  28. Disentangled Generative Graph Representation Learning

    Recently, generative graph models have shown promising results in learning graph representations through self-supervised methods. However, most existing generative graph representation learning (GRL) approaches rely on random masking across the entire graph, which overlooks the entanglement of learned representations. This oversight results in non-robustness and a lack of explainability ...