Skip to content

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Aug 28, 2025

Graph traversal / toposort is used widely by our rewrites.

This PR rewrites most methods for performance, by ignoring the DRY principle and hoisting checks out of hot loops.

It further converts existing methods to generators, to avoid alocating large intermediate memory, when iterating one item at a time suffices, or when we end up copying immediately to another container (like a deque)

MAJOR CHANGE: We avoid reversing the inputs as we iterate, which gives a major change in the order of the returned variables. Instead of doing a left-recursive depth-first search we end up doing a right-recursive depth-first search. This may be a bit confusing, in that stuff like ancestors(pt.exp(x, y)) will yield y, x in that order.

Hopefully people aren't relying on the order of the inputs when they retrieve them with these helpers.
But if there is a need for it, we can easily add a left_recursive boolean flag to ancestors and toposort (and every function that uses those), to retrieve the old behavior. Note that walk and walk_toposort are completely flexible in the order of iteration since the expansion comes from a user provided function (when we are the ones defining the functions, we could also add the option to reverse during expansion).

Iterating over the ancestors is now ~3x faster and toposort ~2x faster than before. Furthermore, several rewrites don't really need to be applied in topological order, so I added a new "dfs" order to the WalkingGraphRewriter and used that instead. This should speedup ~5-6x iteration speed over the nodes (besides what's saved by avoiding the full list allocation / copy).

Some benchmarks
Before:
------------------------------------------------------------------------------------------------------------- benchmark: 6 tests ------------------------------------------------------------------------------------------------------------
Name (time in us)                                                       Min                 Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_traversal_benchmark[variable_ancestors]                        29.5750 (1.0)       98.3650 (1.0)       31.5303 (1.0)       2.4287 (1.0)       30.8580 (1.0)       0.6220 (1.0)     1480;1694       31.7156 (1.0)       12797           1
test_traversal_benchmark[variable_ancestors_with_blockers]          44.8940 (1.52)     116.6280 (1.19)      48.1869 (1.53)      4.2825 (1.76)      46.8980 (1.52)      1.2020 (1.93)    1037;1676       20.7525 (0.65)      10435           1
test_traversal_benchmark[toposort]                                 137.1270 (4.64)     397.3450 (4.04)     149.1138 (4.73)     12.3319 (5.08)     146.1640 (4.74)      3.6467 (5.86)      434;613        6.7063 (0.21)       5703           1
test_traversal_benchmark[toposort_with_blockers]                   154.1990 (5.21)     322.4950 (3.28)     173.2635 (5.50)     18.8500 (7.76)     166.2820 (5.39)      9.7030 (15.60)     465;609        5.7716 (0.18)       5079           1
test_traversal_benchmark[toposort_with_orderings]                  207.3880 (7.01)     375.4040 (3.82)     222.3302 (7.05)     17.9360 (7.39)     217.7890 (7.06)      7.6143 (12.24)     200;279        4.4978 (0.14)       3009           1
test_traversal_benchmark[toposort_with_orderings_and_blockers]     221.5450 (7.49)     415.8600 (4.23)     245.4770 (7.79)     20.4803 (8.43)     239.0880 (7.75)     23.7995 (38.26)      262;61        4.0737 (0.13)       2563           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


After changing traversal routines (apply ancestors is new):
------------------------------------------------------------------------------------------------------------- benchmark: 8 tests ------------------------------------------------------------------------------------------------------------
Name (time in us)                                                       Min                 Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_traversal_benchmark[variable_ancestors]                        10.6100 (1.0)       36.1580 (1.0)       11.9244 (1.0)       1.6538 (1.0)       11.0910 (1.0)       1.7940 (14.83)    1266;892       83.8620 (1.0)       21596           1
test_traversal_benchmark[apply_ancestors]                           17.4430 (1.64)      72.4750 (2.00)      18.3884 (1.54)      2.4648 (1.49)      17.7030 (1.60)      0.1210 (1.0)     1936;4478       54.3821 (0.65)      32271           1
test_traversal_benchmark[variable_ancestors_with_blockers]          17.7830 (1.68)      65.8540 (1.82)      20.0402 (1.68)      3.8615 (2.33)      18.5650 (1.67)      2.0540 (16.98)   1491;1552       49.8996 (0.60)      16987           1
test_traversal_benchmark[apply_ancestors_with_blockers)]            25.8390 (2.44)      87.4140 (2.42)      27.7306 (2.33)      4.1631 (2.52)      26.3895 (2.38)      0.3710 (3.07)    1659;4314       36.0612 (0.43)      21328           1
test_traversal_benchmark[toposort]                                  54.9020 (5.17)     136.0450 (3.76)      61.9132 (5.19)      7.4951 (4.53)      58.2490 (5.25)     10.4115 (86.05)     689;144       16.1516 (0.19)       7325           1
test_traversal_benchmark[toposort_with_blockers]                    71.9340 (6.78)     201.2770 (5.57)      83.2297 (6.98)      5.5022 (3.33)      82.2340 (7.41)      3.0885 (25.52)   1471;1221       12.0149 (0.14)       8553           1
test_traversal_benchmark[toposort_with_orderings]                  153.3570 (14.45)    434.7250 (12.02)    165.1855 (13.85)    12.5754 (7.60)     162.0940 (14.61)     6.4120 (52.99)     334;386        6.0538 (0.07)       3550           1
test_traversal_benchmark[toposort_with_orderings_and_blockers]     171.6720 (16.18)    339.6570 (9.39)     201.5370 (16.90)     9.6037 (5.81)     200.6060 (18.09)     8.4960 (70.22)     537;228        4.9619 (0.06)       3061           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


After removing owner indirection
------------------------------------------------------------------------------------------------------------- benchmark: 8 tests ------------------------------------------------------------------------------------------------------------
Name (time in us)                                                       Min                 Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_traversal_benchmark[variable_ancestors]                         9.3170 (1.0)       77.9560 (1.73)       9.9001 (1.0)       1.2099 (1.0)        9.5780 (1.0)       0.1310 (1.0)     1514;4325      101.0088 (1.0)       28276           1
test_traversal_benchmark[apply_ancestors]                           14.2560 (1.53)      45.1540 (1.0)       15.4488 (1.56)      1.5920 (1.32)      14.6270 (1.53)      1.7330 (13.23)   2085;1396       64.7298 (0.64)      38612           1
test_traversal_benchmark[variable_ancestors_with_blockers]          16.6910 (1.79)      61.2450 (1.36)      18.5778 (1.88)      2.3510 (1.94)      17.7085 (1.85)      1.8030 (13.76)   1688;1446       53.8277 (0.53)      22572           1
test_traversal_benchmark[apply_ancestors_with_blockers)]            21.8810 (2.35)      95.5690 (2.12)      23.5970 (2.38)      2.7487 (2.27)      22.7330 (2.37)      0.6910 (5.27)     938;3050       42.3782 (0.42)      14023           1
test_traversal_benchmark[toposort]                                  49.1120 (5.27)     182.4520 (4.04)      54.0905 (5.46)      5.9141 (4.89)      52.1170 (5.44)      1.9440 (14.84)   1775;2523       18.4875 (0.18)      13817           1
test_traversal_benchmark[toposort_with_blockers]                    58.0290 (6.23)     132.3280 (2.93)      68.5105 (6.92)      7.3542 (6.08)      69.8710 (7.29)     10.1190 (77.24)    4523;258       14.5963 (0.14)      11774           1
test_traversal_benchmark[toposort_with_orderings]                  146.1340 (15.68)    269.2550 (5.96)     163.9253 (16.56)    13.4994 (11.16)    157.9760 (16.49)    20.8965 (159.52)    1000;40        6.1003 (0.06)       4335           1
test_traversal_benchmark[toposort_with_orderings_and_blockers]     167.0730 (17.93)    300.7940 (6.66)     183.5841 (18.54)    12.7613 (10.55)    178.1140 (18.60)    17.9130 (136.74)     555;47        5.4471 (0.05)       3111           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


This PR decides to move the traversal functionality to its own file, which I think is more manageable. There are some backward compatible imports with FutureWarnings.

Another breaking change is that the io_toposort and general_toposort no longer accept the special keyword arguments.

  • clients in io_toposort and general_toposort, which had a single internal use in one scan rewrite that can be avoided by just requesting them from the inner FunctionGraph (this didn't use to exist back in the day).

  • compute_deps_cache and compute_deps in general_toposort, which io_toposort made use of to reduce one level of callable nesting. This is really minor now that we cleaned up other things and it was likely also only an internal usage.


Also remove the .owner access indirection (behind a property), which adds extra cost for a very very common accessed attribute. You can see in the benchmarks above that the effect is visible there, but it should go beyond traversal functions.

Also cleanup FunctionGraph methods to reduce attribute access, single line function or expensive checks.

@ricardoV94 ricardoV94 changed the title Reduce cost of singleton WalkingGraphRewriters Allow cheaper bfs/dfs order in WalkingGraphRewriter Aug 28, 2025
Copy link

codecov bot commented Aug 28, 2025

Codecov Report

❌ Patch coverage is 90.34853% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.70%. Comparing base (801845c) to head (7353874).

Files with missing lines Patch % Lines
pytensor/graph/traversal.py 89.10% 13 Missing and 9 partials ⚠️
pytensor/graph/basic.py 71.42% 3 Missing and 1 partial ⚠️
pytensor/graph/fg.py 94.28% 0 Missing and 2 partials ⚠️
pytensor/graph/op.py 93.75% 1 Missing and 1 partial ⚠️
pytensor/graph/rewriting/basic.py 71.42% 1 Missing and 1 partial ⚠️
pytensor/tensor/blas.py 87.50% 2 Missing ⚠️
pytensor/graph/features.py 75.00% 1 Missing ⚠️
pytensor/printing.py 66.66% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (90.34%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1596      +/-   ##
==========================================
- Coverage   81.71%   81.70%   -0.02%     
==========================================
  Files         230      231       +1     
  Lines       52931    52922       -9     
  Branches     9403     9384      -19     
==========================================
- Hits        43255    43238      -17     
- Misses       7245     7252       +7     
- Partials     2431     2432       +1     
Files with missing lines Coverage Δ
pytensor/compile/builders.py 88.66% <100.00%> (+0.03%) ⬆️
pytensor/compile/debugmode.py 61.59% <100.00%> (+0.03%) ⬆️
pytensor/compile/function/types.py 80.73% <100.00%> (+0.02%) ⬆️
pytensor/d3viz/formatting.py 11.45% <100.00%> (+0.46%) ⬆️
pytensor/graph/replace.py 84.37% <100.00%> (+0.16%) ⬆️
pytensor/graph/rewriting/utils.py 100.00% <100.00%> (ø)
pytensor/ifelse.py 52.40% <100.00%> (+0.13%) ⬆️
pytensor/link/c/basic.py 87.76% <100.00%> (+0.03%) ⬆️
pytensor/scalar/basic.py 80.57% <100.00%> (+<0.01%) ⬆️
pytensor/scan/basic.py 84.38% <100.00%> (+0.03%) ⬆️
... and 29 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ricardoV94 ricardoV94 force-pushed the improve_rewrites branch 2 times, most recently from 20d88e5 to c42ba61 Compare August 29, 2025 01:32
@ricardoV94 ricardoV94 changed the title Allow cheaper bfs/dfs order in WalkingGraphRewriter Speedup internal usages of toposort Aug 29, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ricardoV94 ricardoV94 force-pushed the improve_rewrites branch 8 times, most recently from 3666a4b to afdfd7b Compare September 1, 2025 00:54
@ricardoV94 ricardoV94 changed the title Speedup internal usages of toposort Speedup graph traversal functions Sep 1, 2025
@ricardoV94 ricardoV94 force-pushed the improve_rewrites branch 7 times, most recently from deeba35 to a2d80ed Compare September 2, 2025 12:10
@ricardoV94 ricardoV94 marked this pull request as ready for review September 2, 2025 13:50
@ricardoV94 ricardoV94 requested review from Copilot and jessegrabowski and removed request for Copilot September 2, 2025 13:50
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes graph traversal and topological sorting functions across PyTensor to improve performance, especially for rewrite operations. The main changes involve moving graph traversal functionality to a dedicated module and implementing more efficient algorithms.

Key changes:

  • Rewrote traversal functions for 3x faster ancestors iteration and 2x faster topological sorting
  • Converted methods to generators to reduce memory allocation
  • Added a new "dfs" order to WalkingGraphRewriter for 5-6x iteration speedup
  • Moved traversal functionality from pytensor.graph.basic to pytensor.graph.traversal

Reviewed Changes

Copilot reviewed 65 out of 65 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pytensor/graph/traversal.py New module containing optimized graph traversal algorithms and functions
pytensor/graph/basic.py Removed traversal functions, retained core graph structures
pytensor/graph/rewriting/basic.py Added "dfs" order option and updated imports for traversal functions
Multiple test files Updated imports to use traversal functions from new module
Multiple rewriting modules Changed from in2out/out2in to dfs_rewriter for better performance
Comments suppressed due to low confidence (7)

pytensor/graph/fg.py:1

  • The error message pattern has changed from 'not reversible' to 'not iterable', but this might indicate a behavioral change in how the validation works. Verify that this change in error message is intentional and reflects the actual validation logic.
"""A container for specifying and manipulating a graph with distinct inputs and outputs."""

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant