Skip to content

Conversation

@holiman
Copy link
Contributor

@holiman holiman commented Oct 23, 2024

Looking at the cpu profile of a burntpix benchmark, I noticed that a lot of time was spent in gas-used, in the interpreter loop. It's an actual call (not inlined), which explicitly wants to be ignored by tracing ("tracing.GasChangeIgnored"), so it can be safely and simply inlined.

The other change is in pushX. These also do a call to common.RightPadBytes. I replaced that by a doing a corresponding Lsh on the u256 if needed. Note: it's needed only to make the stack output look right, for fuzzers. It technically doesn't matter what we put there: if code ends on a pushdata immediate, nothing will consume the stack element. We could just as well just ignore it, if we didn't care about fuzzers (which I do).

Seems quite a lot faster on burntpix, according to my runs.

This PR:

EVM gas used:    5642735088
execution time:  34.84609475s
allocations:     915683
allocated bytes: 175334088
EVM gas used:    5642735088
execution time:  36.671958278s
allocations:     915701
allocated bytes: 175340528

Master

EVM gas used:    5642735088
execution time:  49.349209526s
allocations:     915684
allocated bytes: 175333368
EVM gas used:    5642735088
execution time:  46.581006598s
allocations:     915681
allocated bytes: 175330728

@holiman
Copy link
Contributor Author

holiman commented Oct 23, 2024

Btw if we want this, we can also remove tracing.GasChangeIgnored. cc @s1na ?

@holiman holiman force-pushed the speedy_interpreter_loop branch from 63d2422 to 8c582ab Compare October 23, 2024 11:42
@holiman
Copy link
Contributor Author

holiman commented Oct 23, 2024

Added a benchmark

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm/runtime
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
                        │ loop.master  │            loop.master.2             │
                        │    sec/op    │    sec/op     vs base                │
SimpleLoop/loop2-100M-8   846.1m ± 15%   645.2m ± 33%  -23.74% (p=0.020 n=20)

                        │  loop.master  │          loop.master.2          │
                        │     B/op      │     B/op       vs base          │
SimpleLoop/loop2-100M-8   6.365Ki ± 13%   6.570Ki ± 17%  ~ (p=0.640 n=20)

                        │ loop.master │        loop.master.2         │
                        │  allocs/op  │ allocs/op   vs base          │
SimpleLoop/loop2-100M-8    46.00 ± 4%   46.00 ± 4%  ~ (p=0.576 n=20)

a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])

// Missing bytes: pushByteSize - len(pushData)
if missing := pushByteSize - (end - start); missing > 0 {
Copy link
Contributor

@jwasinger jwasinger Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we could remove this if from the hot-loop and adjust the tracers correspondingly, it's (understandably) a micro optimization that doesn't bring measurable performance benefits:

                                       │ bench_if_removed.txt │      bench_if_not_removed.txt      │                                                                                                                
                                       │        sec/op        │   sec/op     vs base               │                                                                                                                
SimpleLoop/staticcall-identity-100M-12            162.9m ± 0%   160.7m ± 2%  -1.35% (p=0.009 n=10)                                                                                                                  
SimpleLoop/call-identity-100M-12                  183.9m ± 2%   180.8m ± 1%  -1.64% (p=0.001 n=10)
SimpleLoop/loop-100M-12                           182.0m ± 1%   182.0m ± 0%       ~ (p=0.853 n=10)
SimpleLoop/loop2-100M-12                          200.7m ± 0%   208.0m ± 5%  +3.68% (p=0.000 n=10)
SimpleLoop/call-nonexist-100M-12                  496.8m ± 2%   508.8m ± 9%  +2.41% (p=0.011 n=10)
SimpleLoop/call-EOA-100M-12                       142.3m ± 0%   141.1m ± 5%       ~ (p=0.143 n=10)
SimpleLoop/call-reverting-100M-12                 323.7m ± 0%   324.9m ± 1%       ~ (p=0.123 n=10)
geomean                                           220.1m        220.9m       +0.37%

@fjl fjl added this to the 1.14.12 milestone Oct 30, 2024
@fjl fjl merged commit 25bc077 into ethereum:master Oct 30, 2024
3 checks passed
holiman added a commit that referenced this pull request Nov 19, 2024
Looking at the cpu profile of a burntpix benchmark, I noticed that a lot
of time was spent in gas-used, in the interpreter loop. It's an actual
call (not inlined), which explicitly wants to be ignored by tracing
("tracing.GasChangeIgnored"), so it can be safely and simply inlined.

The other change is in `pushX`. These also do a call to
`common.RightPadBytes`. I replaced that by a doing a corresponding `Lsh`
on the `u256` if needed. Note: it's needed only to make the stack output
look right, for fuzzers. It technically doesn't matter what we put
there: if code ends on a pushdata immediate, nothing will consume the
stack element. We could just as well just ignore it, if we didn't care
about fuzzers (which I do).

Seems quite a lot faster on burntpix, according to my runs. 

This PR:
```
EVM gas used:    5642735088
execution time:  34.84609475s
allocations:     915683
allocated bytes: 175334088
```
```
EVM gas used:    5642735088
execution time:  36.671958278s
allocations:     915701
allocated bytes: 175340528
```

Master
```
EVM gas used:    5642735088
execution time:  49.349209526s
allocations:     915684
allocated bytes: 175333368
```
```
EVM gas used:    5642735088
execution time:  46.581006598s
allocations:     915681
allocated bytes: 175330728
```

---------

Co-authored-by: Sina M <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
jakub-freebit pushed a commit to fblch/go-ethereum that referenced this pull request Jul 3, 2025
Looking at the cpu profile of a burntpix benchmark, I noticed that a lot
of time was spent in gas-used, in the interpreter loop. It's an actual
call (not inlined), which explicitly wants to be ignored by tracing
("tracing.GasChangeIgnored"), so it can be safely and simply inlined.

The other change is in `pushX`. These also do a call to
`common.RightPadBytes`. I replaced that by a doing a corresponding `Lsh`
on the `u256` if needed. Note: it's needed only to make the stack output
look right, for fuzzers. It technically doesn't matter what we put
there: if code ends on a pushdata immediate, nothing will consume the
stack element. We could just as well just ignore it, if we didn't care
about fuzzers (which I do).

Seems quite a lot faster on burntpix, according to my runs. 

This PR:
```
EVM gas used:    5642735088
execution time:  34.84609475s
allocations:     915683
allocated bytes: 175334088
```
```
EVM gas used:    5642735088
execution time:  36.671958278s
allocations:     915701
allocated bytes: 175340528
```

Master
```
EVM gas used:    5642735088
execution time:  49.349209526s
allocations:     915684
allocated bytes: 175333368
```
```
EVM gas used:    5642735088
execution time:  46.581006598s
allocations:     915681
allocated bytes: 175330728
```

---------

Co-authored-by: Sina M <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
gballet pushed a commit to gballet/go-ethereum that referenced this pull request Sep 11, 2025
Looking at the cpu profile of a burntpix benchmark, I noticed that a lot
of time was spent in gas-used, in the interpreter loop. It's an actual
call (not inlined), which explicitly wants to be ignored by tracing
("tracing.GasChangeIgnored"), so it can be safely and simply inlined.

The other change is in `pushX`. These also do a call to
`common.RightPadBytes`. I replaced that by a doing a corresponding `Lsh`
on the `u256` if needed. Note: it's needed only to make the stack output
look right, for fuzzers. It technically doesn't matter what we put
there: if code ends on a pushdata immediate, nothing will consume the
stack element. We could just as well just ignore it, if we didn't care
about fuzzers (which I do).

Seems quite a lot faster on burntpix, according to my runs. 

This PR:
```
EVM gas used:    5642735088
execution time:  34.84609475s
allocations:     915683
allocated bytes: 175334088
```
```
EVM gas used:    5642735088
execution time:  36.671958278s
allocations:     915701
allocated bytes: 175340528
```

Master
```
EVM gas used:    5642735088
execution time:  49.349209526s
allocations:     915684
allocated bytes: 175333368
```
```
EVM gas used:    5642735088
execution time:  46.581006598s
allocations:     915681
allocated bytes: 175330728
```

---------

Co-authored-by: Sina M <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants