A sophisticated LLVM compiler pass that implements memory fencing optimization for concurrent programs under different relaxed memory models. This project provides automated fence insertion and optimization techniques for Total Store Ordering (TSO) and Partial Store Ordering (PSO) memory models.
Sequentially-Consistent (SC) fences enforce a happens-before relationship between operations preceding and following the fence, guaranteeing that all operations before the fence complete prior to executing those after it, both at hardware and compiler levels. Fences are inserted during the LLVM IR stage, allowing the backend to convert them to target architecture instructions or ignore them when unnecessary (e.g., SC fences are essentially ignored on x86).
This project implements:
- ๐ง Fence Insertion Passes: Automatically insert memory fences to ensure program correctness under TSO and PSO memory models
- โก Fence Optimization: Use max-flow min-cut algorithms to remove redundant fences while maintaining correctness
- ๐งช Litmus Test Suite: Comprehensive testing with classical concurrent programming patterns
| Memory Model | Description | Use Case |
|---|---|---|
| TSO (Total Store Ordering) | Allows load-load, load-store, and store-store reordering, but prohibits store-load reordering | x86/x64 architectures |
| PSO (Partial Store Ordering) | More relaxed than TSO, allows store-store reordering | SPARC architectures |
The passes insert fences based on consecutive atomic memory operation pairs. Key insight: Fences are only introduced between two relaxed atomic memory operations, as C11 memory ordering for other operations already provides the required guarantees.
| Op1 | Order1 | Op2 | Order2 | TSO | PSO |
|---|---|---|---|---|---|
| R | RLX | R | RLX | R-Fence-R | R-Fence-R |
| R | RLX | W | RLX | R-Fence-W | R-Fence-W |
| W | RLX | R | RLX | R-W | R-W |
| W | RLX | W | RLX | W-Fence-W | W-W |
| R | ACQ | * | * | R-Op2 | R-Op2 |
| * | * | W | REL | Op1-W | Op1-W |
| W | REL | R | ACQ | W-R | W-R |
| * | SEQ_CST | * | SEQ_CST | Op1-Op2 | Op1-Op2 |
| * | ACQ_REL | * | ACQ_REL | Op1-Op2 | Op1-Op2 |
Note: The main difference between TSO and PSO is that PSO allows Write-Write reordering while TSO does not.
The project consists of three main LLVM passes:
๐ fencing/
โโโ ๐ง FenceTSO.cpp # TSO fence insertion pass
โโโ ๐ง FencePSO.cpp # PSO fence insertion pass
โโโ โก FenceOptimization.cpp # Max-flow based fence optimization
โโโ ๐ FencingPasses.h # Pass interface definitions
Fence Insertion Algorithm: Both TSO and PSO passes traverse each function starting from the initial basic block, following all possible control flows while tracking the last atomic memory operation. When a consecutive operation pair that is "too relaxed" appears, the algorithm inserts two fences per operation pair: one after the initial operation and one before the second operation.
Fence Optimization Algorithm:
- Graph Construction: Build a special graph by removing fence instructions and inserting nodes before/after their locations
- Network Flow: Create source/sink connections to form a network flow graph
- Min-Cut Calculation: Apply Ford-Fulkerson max-flow algorithm to find the minimum cut
- Optimal Placement: Insert fences only at min-cut locations for optimal performance
This approach ensures minimal fence overhead while maintaining program correctness.
The project includes implementations of classical concurrent programming patterns used to validate fence synthesis:
- MP (Message Passing): Producer-consumer synchronization pattern
- SB (Store Buffering): Symmetric store-load pattern testing write-read reordering
- LB (Load Buffering): Circular dependency pattern testing read-write reordering
- IRIW (Independent Reads of Independent Writes): Multi-reader consistency pattern
- Branch Control Flow: Tests fence insertion across conditional branches
Each test covers different consecutive memory operation types (read-write, write-write, write-read) with various memory ordering combinations to ensure comprehensive validation of both TSO and PSO fence synthesis methods.
- LLVM 14+ with development headers
- CMake 3.12+
- C++17 compatible compiler
- llvm-lit for running tests
# Configure with CMake
cmake -DLLVM_LIT=$(which lit) -DLLVM_DIR=/usr/lib64/cmake/llvm -S . -B build
# Build the passes
cmake --build build
# Navigate to build directory
cd build && make๐ก Tip: Adjust
LLVM_DIRto point to your LLVM installation's cmake directory.
opt -load-pass-plugin ../build/fencing/FencingPass.so -S -passes=fence-tso input.llopt -load-pass-plugin ../build/fencing/FencingPass.so -S -passes=fence-pso input.llopt -load-pass-plugin ../build/fencing/FencingPass.so -S -passes=fence-tso,fence-opt input.llRun the comprehensive test suite:
# Run all tests
make test
# Run LLVM lit tests specifically
make run-lit-testsThis project demonstrates several important concepts:
- ๐ Relaxed Memory Models: Understanding how modern processors reorder memory operations
- ๐ก๏ธ Correctness Preservation: Ensuring concurrent programs behave as expected
- ๐ Optimization Theory: Applying graph algorithms (max-flow min-cut) to compiler optimization
- ๐๏ธ LLVM Development: Building production-quality compiler passes
Authors: Mihnea Bernevig, Yigit Colakoglu
This implementation builds upon established research in memory fencing optimization:
- Sequential Consistency: Ensuring program order is preserved across threads
- Fence Insertion Algorithms: Systematic approaches to placing memory barriers
- Min-Cut Optimization: Based on "Partially redundant fence elimination for x86, arm, and power processors" (Morisset & Zappa Nardelli, 2017)
- Relaxed Memory Models: Understanding hardware-level memory reordering behaviors
The project demonstrates practical application of graph theory (max-flow min-cut) to compiler optimization, bridging theoretical computer science with systems programming.
cs4560_fencing/
โโโ ๐ fencing/ # Core LLVM passes
โโโ ๐ litmus_tests/ # C++ concurrent test programs
โโโ ๐ tests/ # LLVM IR test cases
โโโ ๐ CMakeLists.txt # Build configuration
โโโ ๐ README.md # This file
- Morisset, R., & Zappa Nardelli, F. (2017). Partially redundant fence elimination for x86, arm, and power processors. In Proceedings of the 26th International Conference on Compiler Construction (CC 2017) (pp. 1โ10). Austin, TX, USA: Association for Computing Machinery. https://doi.org/10.1145/3033019.3033021