Speedup graph traversal functions #1596

ricardoV94 · 2025-08-28T17:49:45Z

Graph traversal / toposort is used widely by our rewrites.

This PR rewrites most methods for performance, by ignoring the DRY principle and hoisting checks out of hot loops.

It further converts existing methods to generators, to avoid alocating large intermediate memory, when iterating one item at a time suffices, or when we end up copying immediately to another container (like a deque)

MAJOR CHANGE: We avoid reversing the inputs as we iterate, which gives a major change in the order of the returned variables. Instead of doing a left-recursive depth-first search we end up doing a right-recursive depth-first search. This may be a bit confusing, in that stuff like ancestors(pt.exp(x, y)) will yield y, x in that order.

Hopefully people aren't relying on the order of the inputs when they retrieve them with these helpers.
But if there is a need for it, we can easily add a left_recursive boolean flag to ancestors and toposort (and every function that uses those), to retrieve the old behavior. Note that walk and walk_toposort are completely flexible in the order of iteration since the expansion comes from a user provided function (when we are the ones defining the functions, we could also add the option to reverse during expansion).

Iterating over the ancestors is now ~3x faster and toposort ~2x faster than before. Furthermore, several rewrites don't really need to be applied in topological order, so I added a new "dfs" order to the WalkingGraphRewriter and used that instead. This should speedup ~5-6x iteration speed over the nodes (besides what's saved by avoiding the full list allocation / copy).

Some benchmarks

Before:
------------------------------------------------------------------------------------------------------------- benchmark: 6 tests ------------------------------------------------------------------------------------------------------------
Name (time in us)                                                       Min                 Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_traversal_benchmark[variable_ancestors]                        29.5750 (1.0)       98.3650 (1.0)       31.5303 (1.0)       2.4287 (1.0)       30.8580 (1.0)       0.6220 (1.0)     1480;1694       31.7156 (1.0)       12797           1
test_traversal_benchmark[variable_ancestors_with_blockers]          44.8940 (1.52)     116.6280 (1.19)      48.1869 (1.53)      4.2825 (1.76)      46.8980 (1.52)      1.2020 (1.93)    1037;1676       20.7525 (0.65)      10435           1
test_traversal_benchmark[toposort]                                 137.1270 (4.64)     397.3450 (4.04)     149.1138 (4.73)     12.3319 (5.08)     146.1640 (4.74)      3.6467 (5.86)      434;613        6.7063 (0.21)       5703           1
test_traversal_benchmark[toposort_with_blockers]                   154.1990 (5.21)     322.4950 (3.28)     173.2635 (5.50)     18.8500 (7.76)     166.2820 (5.39)      9.7030 (15.60)     465;609        5.7716 (0.18)       5079           1
test_traversal_benchmark[toposort_with_orderings]                  207.3880 (7.01)     375.4040 (3.82)     222.3302 (7.05)     17.9360 (7.39)     217.7890 (7.06)      7.6143 (12.24)     200;279        4.4978 (0.14)       3009           1
test_traversal_benchmark[toposort_with_orderings_and_blockers]     221.5450 (7.49)     415.8600 (4.23)     245.4770 (7.79)     20.4803 (8.43)     239.0880 (7.75)     23.7995 (38.26)      262;61        4.0737 (0.13)       2563           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


After changing traversal routines (apply ancestors is new):
------------------------------------------------------------------------------------------------------------- benchmark: 8 tests ------------------------------------------------------------------------------------------------------------
Name (time in us)                                                       Min                 Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_traversal_benchmark[variable_ancestors]                        10.6100 (1.0)       36.1580 (1.0)       11.9244 (1.0)       1.6538 (1.0)       11.0910 (1.0)       1.7940 (14.83)    1266;892       83.8620 (1.0)       21596           1
test_traversal_benchmark[apply_ancestors]                           17.4430 (1.64)      72.4750 (2.00)      18.3884 (1.54)      2.4648 (1.49)      17.7030 (1.60)      0.1210 (1.0)     1936;4478       54.3821 (0.65)      32271           1
test_traversal_benchmark[variable_ancestors_with_blockers]          17.7830 (1.68)      65.8540 (1.82)      20.0402 (1.68)      3.8615 (2.33)      18.5650 (1.67)      2.0540 (16.98)   1491;1552       49.8996 (0.60)      16987           1
test_traversal_benchmark[apply_ancestors_with_blockers)]            25.8390 (2.44)      87.4140 (2.42)      27.7306 (2.33)      4.1631 (2.52)      26.3895 (2.38)      0.3710 (3.07)    1659;4314       36.0612 (0.43)      21328           1
test_traversal_benchmark[toposort]                                  54.9020 (5.17)     136.0450 (3.76)      61.9132 (5.19)      7.4951 (4.53)      58.2490 (5.25)     10.4115 (86.05)     689;144       16.1516 (0.19)       7325           1
test_traversal_benchmark[toposort_with_blockers]                    71.9340 (6.78)     201.2770 (5.57)      83.2297 (6.98)      5.5022 (3.33)      82.2340 (7.41)      3.0885 (25.52)   1471;1221       12.0149 (0.14)       8553           1
test_traversal_benchmark[toposort_with_orderings]                  153.3570 (14.45)    434.7250 (12.02)    165.1855 (13.85)    12.5754 (7.60)     162.0940 (14.61)     6.4120 (52.99)     334;386        6.0538 (0.07)       3550           1
test_traversal_benchmark[toposort_with_orderings_and_blockers]     171.6720 (16.18)    339.6570 (9.39)     201.5370 (16.90)     9.6037 (5.81)     200.6060 (18.09)     8.4960 (70.22)     537;228        4.9619 (0.06)       3061           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


After removing owner indirection
------------------------------------------------------------------------------------------------------------- benchmark: 8 tests ------------------------------------------------------------------------------------------------------------
Name (time in us)                                                       Min                 Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_traversal_benchmark[variable_ancestors]                         9.3170 (1.0)       77.9560 (1.73)       9.9001 (1.0)       1.2099 (1.0)        9.5780 (1.0)       0.1310 (1.0)     1514;4325      101.0088 (1.0)       28276           1
test_traversal_benchmark[apply_ancestors]                           14.2560 (1.53)      45.1540 (1.0)       15.4488 (1.56)      1.5920 (1.32)      14.6270 (1.53)      1.7330 (13.23)   2085;1396       64.7298 (0.64)      38612           1
test_traversal_benchmark[variable_ancestors_with_blockers]          16.6910 (1.79)      61.2450 (1.36)      18.5778 (1.88)      2.3510 (1.94)      17.7085 (1.85)      1.8030 (13.76)   1688;1446       53.8277 (0.53)      22572           1
test_traversal_benchmark[apply_ancestors_with_blockers)]            21.8810 (2.35)      95.5690 (2.12)      23.5970 (2.38)      2.7487 (2.27)      22.7330 (2.37)      0.6910 (5.27)     938;3050       42.3782 (0.42)      14023           1
test_traversal_benchmark[toposort]                                  49.1120 (5.27)     182.4520 (4.04)      54.0905 (5.46)      5.9141 (4.89)      52.1170 (5.44)      1.9440 (14.84)   1775;2523       18.4875 (0.18)      13817           1
test_traversal_benchmark[toposort_with_blockers]                    58.0290 (6.23)     132.3280 (2.93)      68.5105 (6.92)      7.3542 (6.08)      69.8710 (7.29)     10.1190 (77.24)    4523;258       14.5963 (0.14)      11774           1
test_traversal_benchmark[toposort_with_orderings]                  146.1340 (15.68)    269.2550 (5.96)     163.9253 (16.56)    13.4994 (11.16)    157.9760 (16.49)    20.8965 (159.52)    1000;40        6.1003 (0.06)       4335           1
test_traversal_benchmark[toposort_with_orderings_and_blockers]     167.0730 (17.93)    300.7940 (6.66)     183.5841 (18.54)    12.7613 (10.55)    178.1140 (18.60)    17.9130 (136.74)     555;47        5.4471 (0.05)       3111           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This PR decides to move the traversal functionality to its own file, which I think is more manageable. There are some backward compatible imports with FutureWarnings.

Another breaking change is that the io_toposort and general_toposort no longer accept the special keyword arguments.

clients in io_toposort and general_toposort, which had a single internal use in one scan rewrite that can be avoided by just requesting them from the inner FunctionGraph (this didn't use to exist back in the day).
compute_deps_cache and compute_deps in general_toposort, which io_toposort made use of to reduce one level of callable nesting. This is really minor now that we cleaned up other things and it was likely also only an internal usage.

Also remove the .owner access indirection (behind a property), which adds extra cost for a very very common accessed attribute. You can see in the benchmarks above that the effect is visible there, but it should go beyond traversal functions.

Also cleanup FunctionGraph methods to reduce attribute access, single line function or expensive checks.

codecov · 2025-08-28T18:20:59Z

Codecov Report

❌ Patch coverage is 90.34853% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.70%. Comparing base (801845c) to head (7353874).

Files with missing lines	Patch %	Lines
pytensor/graph/traversal.py	89.10%	13 Missing and 9 partials ⚠️
pytensor/graph/basic.py	71.42%	3 Missing and 1 partial ⚠️
pytensor/graph/fg.py	94.28%	0 Missing and 2 partials ⚠️
pytensor/graph/op.py	93.75%	1 Missing and 1 partial ⚠️
pytensor/graph/rewriting/basic.py	71.42%	1 Missing and 1 partial ⚠️
pytensor/tensor/blas.py	87.50%	2 Missing ⚠️
pytensor/graph/features.py	75.00%	1 Missing ⚠️
pytensor/printing.py	66.66%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (90.34%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1596      +/-   ##
==========================================
- Coverage   81.71%   81.70%   -0.02%     
==========================================
  Files         230      231       +1     
  Lines       52931    52922       -9     
  Branches     9403     9384      -19     
==========================================
- Hits        43255    43238      -17     
- Misses       7245     7252       +7     
- Partials     2431     2432       +1

Files with missing lines	Coverage Δ
pytensor/compile/builders.py	`88.66% <100.00%> (+0.03%)`	⬆️
pytensor/compile/debugmode.py	`61.59% <100.00%> (+0.03%)`	⬆️
pytensor/compile/function/types.py	`80.73% <100.00%> (+0.02%)`	⬆️
pytensor/d3viz/formatting.py	`11.45% <100.00%> (+0.46%)`	⬆️
pytensor/graph/replace.py	`84.37% <100.00%> (+0.16%)`	⬆️
pytensor/graph/rewriting/utils.py	`100.00% <100.00%> (ø)`
pytensor/ifelse.py	`52.40% <100.00%> (+0.13%)`	⬆️
pytensor/link/c/basic.py	`87.76% <100.00%> (+0.03%)`	⬆️
pytensor/scalar/basic.py	`80.57% <100.00%> (+<0.01%)`	⬆️
pytensor/scan/basic.py	`84.38% <100.00%> (+0.03%)`	⬆️
... and 29 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pytensor/graph/basic.py

review-notebook-app · 2025-08-30T02:38:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull Request Overview

This PR optimizes graph traversal and topological sorting functions across PyTensor to improve performance, especially for rewrite operations. The main changes involve moving graph traversal functionality to a dedicated module and implementing more efficient algorithms.

Key changes:

Rewrote traversal functions for 3x faster ancestors iteration and 2x faster topological sorting
Converted methods to generators to reduce memory allocation
Added a new "dfs" order to WalkingGraphRewriter for 5-6x iteration speedup
Moved traversal functionality from pytensor.graph.basic to pytensor.graph.traversal

Reviewed Changes

Copilot reviewed 65 out of 65 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`pytensor/graph/traversal.py`	New module containing optimized graph traversal algorithms and functions
`pytensor/graph/basic.py`	Removed traversal functions, retained core graph structures
`pytensor/graph/rewriting/basic.py`	Added "dfs" order option and updated imports for traversal functions
Multiple test files	Updated imports to use traversal functions from new module
Multiple rewriting modules	Changed from `in2out`/`out2in` to `dfs_rewriter` for better performance

Comments suppressed due to low confidence (7)

pytensor/graph/fg.py:1

The error message pattern has changed from 'not reversible' to 'not iterable', but this might indicate a behavioral change in how the validation works. Verify that this change in error message is intentional and reflects the actual validation logic.

"""A container for specifying and manipulating a graph with distinct inputs and outputs."""

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

* Avoid reversing inputs as we traverse graph * Simplify io_toposort without ordering (and refactor into its own function) * Removes client side-effect on previous toposort functions * Remove duplicated logic across methods

ricardoV94 force-pushed the improve_rewrites branch from dbb4551 to dcbcbd9 Compare August 28, 2025 17:55

ricardoV94 changed the title ~~Reduce cost of singleton WalkingGraphRewriters~~ Allow cheaper bfs/dfs order in WalkingGraphRewriter Aug 28, 2025

ricardoV94 force-pushed the improve_rewrites branch 2 times, most recently from 20d88e5 to c42ba61 Compare August 29, 2025 01:32

ricardoV94 changed the title ~~Allow cheaper bfs/dfs order in WalkingGraphRewriter~~ Speedup internal usages of toposort Aug 29, 2025

ricardoV94 commented Aug 29, 2025

View reviewed changes

pytensor/graph/basic.py Outdated Show resolved Hide resolved

ricardoV94 force-pushed the improve_rewrites branch from c42ba61 to 27ede5a Compare August 30, 2025 02:38

ricardoV94 force-pushed the improve_rewrites branch from 27ede5a to 262bd26 Compare August 30, 2025 02:56

ricardoV94 added maintenance graph rewriting performance labels Aug 30, 2025

ricardoV94 force-pushed the improve_rewrites branch 8 times, most recently from 3666a4b to afdfd7b Compare September 1, 2025 00:54

ricardoV94 changed the title ~~Speedup internal usages of toposort~~ Speedup graph traversal functions Sep 1, 2025

ricardoV94 added the major label Sep 1, 2025

ricardoV94 force-pushed the improve_rewrites branch 7 times, most recently from deeba35 to a2d80ed Compare September 2, 2025 12:10

ricardoV94 force-pushed the improve_rewrites branch from a2d80ed to 5ec8e59 Compare September 2, 2025 13:50

ricardoV94 marked this pull request as ready for review September 2, 2025 13:50

ricardoV94 requested review from Copilot and jessegrabowski and removed request for Copilot September 2, 2025 13:50

Copilot AI reviewed Sep 2, 2025

View reviewed changes

ricardoV94 force-pushed the improve_rewrites branch from 5ec8e59 to 276e5a9 Compare September 3, 2025 09:00

ricardoV94 added 11 commits September 3, 2025 11:02

Scipy is not optional

7d57cfb

Note failing scan rewrite

ab837c4

Remove unnecessary checks and unused variable in Scan rewrites

907eef2

Remove unused function replace_nominals_with_dummies

85aa39c

Move view_roots to the only file where it is used

afb5afa

Move io_connection_pattern to graph/op.py

413df99

Move graph traversal functions into their own file

2e300cb

Faster graph traversal functions

d580e0a

* Avoid reversing inputs as we traverse graph * Simplify io_toposort without ordering (and refactor into its own function) * Removes client side-effect on previous toposort functions * Remove duplicated logic across methods

Speedup FunctionGraph methods

d1af6d2

Replace uses of in2out and out2in by a depth-first search rewriter

8398704

Speedup variable owner access

7353874

ricardoV94 force-pushed the improve_rewrites branch from 276e5a9 to 7353874 Compare September 3, 2025 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speedup graph traversal functions #1596

Speedup graph traversal functions #1596

Uh oh!

ricardoV94 commented Aug 28, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Speedup graph traversal functions #1596

Are you sure you want to change the base?

Speedup graph traversal functions #1596

Uh oh!

Conversation

ricardoV94 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

ricardoV94 commented Aug 28, 2025 •

edited

Loading

codecov bot commented Aug 28, 2025 •

edited

Loading