Efficient Graph Condensation via Gaussian Process (GCGP)

📚 Table of Contents

Efficient Graph Condensation via Gaussian Process (GCGP)

🧠 Abstract

Graph condensation reduces graph sizes while maintaining performance, addressing the scalability challenges of GNNs caused by computational inefficiencies on large datasets. Existing methods often rely on bi-level optimization, which requires repeated GNN training and limits scalability.

This paper proposes Graph Condensation via Gaussian Process (GCGP) — a computationally efficient method that leverages a Gaussian Process (GP) to estimate predictions from input nodes without iterative GNN training.

Key innovations:

A covariance function aggregates local neighborhoods to capture complex node dependencies.
Concrete random variables approximate binary adjacency matrices in a differentiable form, enabling gradient-based optimization of discrete graph structures.

🔬 Methodology

Figure 1: Graph condensation condenses a large graph $G$ into a smaller but informative graph $G^{\mathcal{S}}$ that preserves performance on downstream tasks like GNN training.

Conventional graph condensation methods use a bi-level optimization framework:

Inner loop: Train a GNN on the condensed graph.
Outer loop: Update the condensed graph based on performance loss.

This is computationally expensive due to repeated GNN training.

🧪 GCGP: A Simpler Alternative

GCGP replaces iterative GNN training with a Gaussian Process, treating the condensed synthetic graph $G^{\mathcal{S}}$ as GP observations. The GP combines these with model priors to make predictions on the original graph $G$.

Figure 2: The GCGP workflow includes:

Using the condensed graph $G^{\mathcal{S}}$ as GP observations.
Predicting node labels in the original graph $G$.
Optimizing the condensed graph by minimizing the discrepancy between predictions and ground-truth labels.

🛠️ Implementation

🔧 Requirements

python=3.8.20
ogb=1.3.6
pytorch=1.12.1
pyg=2.5.2
numpy=1.24.3

💡 Tip: Install ogb first to avoid CUDA device recognition issues.

To set up the environment, run:

conda env create -f environment.yml

📂 Small Datasets (`Cora`, `Citeseer`, `Pubmed`, `Photo`, `Computers`)

Navigate to the gcgp folder:

cd gcgp

Run GCGP on a dataset (e.g., Cora):

python main.py --dataset Cora --cond_ratio 0.5 --ridge 0.5 --k 4 --epochs 200 --learn_A 0

To reproduce all results:

sh run.sh

Outputs will be saved in ./gcgp/outputs/
Final results collected in ./gcgp/results.csv via results.py

For generalization experiments:

sh run_generalization.sh

Outputs: ./gcgp/outputs_generalization/
Results: ./gcgp/results_generalization.csv

For efficiency/time evaluation:

sh run_time.sh

Outputs: ./gcgp/outputs_time/

🗂️ Large Datasets (`Ogbn-arxiv` and `Reddit`)

🔹 Ogbn-arxiv Dataset

Go to the folder:

cd gcgp_ogb

Run GCGP:

python main.py --dataset ogbn-arxiv --cond_size 90 --ridge 5 --k 2 --epochs 200 --learn_A 0

To reproduce all results:

sh run.sh

Outputs: ./gcgp_ogb/outputs/
Results: ./gcgp_ogb/results.csv

For time analysis:

sh run_time.sh

Outputs: ./gcgp_ogb/outputs_time/

🔹 Reddit Dataset

Navigate to:

cd gcgp_reddit

Run GCGP:

python main.py --dataset Reddit --cond_size 77 --ridge 0.1 --k 2 --epochs 270 --learn_A 0

To reproduce all results:

sh run.sh

Outputs: ./gcgp_reddit/outputs/
Results: ./gcgp_reddit/results.csv

For training time evaluation:

sh run_time.sh

Outputs: ./gcgp_reddit/outputs_time/

📖 Cite Our Paper

If you find our paper or code useful, please cite:

@article{wang2025efficient,
  title={Efficient Graph Condensation via Gaussian Process},
  author={Wang, Lin and Li, Qing},
  journal={arXiv preprint arXiv:2501.02565},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
gcgp		gcgp
gcgp_ogb		gcgp_ogb
gcgp_reddit		gcgp_reddit
link_pred		link_pred
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Efficient Graph Condensation via Gaussian Process (GCGP)

📚 Table of Contents

🧠 Abstract

🔬 Methodology

🧪 GCGP: A Simpler Alternative

🛠️ Implementation

🔧 Requirements

📂 Small Datasets (`Cora`, `Citeseer`, `Pubmed`, `Photo`, `Computers`)

🗂️ Large Datasets (`Ogbn-arxiv` and `Reddit`)

🔹 Ogbn-arxiv Dataset

🔹 Reddit Dataset

📖 Cite Our Paper

📄 License

About

Uh oh!

Releases

Packages

Languages

License

WANGLin0126/GCGP

Folders and files

Latest commit

History

Repository files navigation

Efficient Graph Condensation via Gaussian Process (GCGP)

📚 Table of Contents

🧠 Abstract

🔬 Methodology

🧪 GCGP: A Simpler Alternative

🛠️ Implementation

🔧 Requirements

📂 Small Datasets (Cora, Citeseer, Pubmed, Photo, Computers)

🗂️ Large Datasets (Ogbn-arxiv and Reddit)

🔹 Ogbn-arxiv Dataset

🔹 Reddit Dataset

📖 Cite Our Paper

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

📂 Small Datasets (`Cora`, `Citeseer`, `Pubmed`, `Photo`, `Computers`)

🗂️ Large Datasets (`Ogbn-arxiv` and `Reddit`)

Packages