Graph condensation reduces graph sizes while maintaining performance, addressing the scalability challenges of GNNs caused by computational inefficiencies on large datasets. Existing methods often rely on bi-level optimization, which requires repeated GNN training and limits scalability.
This paper proposes Graph Condensation via Gaussian Process (GCGP) — a computationally efficient method that leverages a Gaussian Process (GP) to estimate predictions from input nodes without iterative GNN training.
Key innovations:
- A covariance function aggregates local neighborhoods to capture complex node dependencies.
- Concrete random variables approximate binary adjacency matrices in a differentiable form, enabling gradient-based optimization of discrete graph structures.
Figure 1: Graph condensation condenses a large graph
Conventional graph condensation methods use a bi-level optimization framework:
- Inner loop: Train a GNN on the condensed graph.
- Outer loop: Update the condensed graph based on performance loss.
This is computationally expensive due to repeated GNN training.
GCGP replaces iterative GNN training with a Gaussian Process, treating the condensed synthetic graph
Figure 2: The GCGP workflow includes:
- Using the condensed graph
$G^{\mathcal{S}}$ as GP observations. - Predicting node labels in the original graph
$G$ . - Optimizing the condensed graph by minimizing the discrepancy between predictions and ground-truth labels.
python=3.8.20
ogb=1.3.6
pytorch=1.12.1
pyg=2.5.2
numpy=1.24.3
💡 Tip: Install
ogb
first to avoid CUDA device recognition issues.
To set up the environment, run:
conda env create -f environment.yml
Navigate to the gcgp
folder:
cd gcgp
Run GCGP on a dataset (e.g., Cora
):
python main.py --dataset Cora --cond_ratio 0.5 --ridge 0.5 --k 4 --epochs 200 --learn_A 0
To reproduce all results:
sh run.sh
- Outputs will be saved in
./gcgp/outputs/
- Final results collected in
./gcgp/results.csv
viaresults.py
For generalization experiments:
sh run_generalization.sh
- Outputs:
./gcgp/outputs_generalization/
- Results:
./gcgp/results_generalization.csv
For efficiency/time evaluation:
sh run_time.sh
- Outputs:
./gcgp/outputs_time/
Go to the folder:
cd gcgp_ogb
Run GCGP:
python main.py --dataset ogbn-arxiv --cond_size 90 --ridge 5 --k 2 --epochs 200 --learn_A 0
To reproduce all results:
sh run.sh
- Outputs:
./gcgp_ogb/outputs/
- Results:
./gcgp_ogb/results.csv
For time analysis:
sh run_time.sh
- Outputs:
./gcgp_ogb/outputs_time/
Navigate to:
cd gcgp_reddit
Run GCGP:
python main.py --dataset Reddit --cond_size 77 --ridge 0.1 --k 2 --epochs 270 --learn_A 0
To reproduce all results:
sh run.sh
- Outputs:
./gcgp_reddit/outputs/
- Results:
./gcgp_reddit/results.csv
For training time evaluation:
sh run_time.sh
- Outputs:
./gcgp_reddit/outputs_time/
If you find our paper or code useful, please cite:
@article{wang2025efficient,
title={Efficient Graph Condensation via Gaussian Process},
author={Wang, Lin and Li, Qing},
journal={arXiv preprint arXiv:2501.02565},
year={2025}
}
MIT License © 2025 WANG Lin