Skip to content

Conversation

Dylan-DPC-zz
Copy link

Successful merges:

Failed merges:

r? @ghost
@rustbot modify labels: rollup

Create a similar rollup

mark-i-m and others added 9 commits April 27, 2021 21:20
On arm64 we have seen on several databases that ISB (instruction synchronization
barrier) is better to use than yield in a spin loop.  The yield instruction is a
nop.  The isb instruction puts the processor to sleep for some short time.  isb
is a good equivalent to the pause instruction on x86.

Below is an experiment that shows the effects of yield and isb on Arm64 and the
time of a pause instruction on x86 Intel processors.  The micro-benchmarks use
https://github.com/google/benchmark.git

$ cat a.cc
static void BM_scalar_increment(benchmark::State& state) {
  int i = 0;
  for (auto _ : state)
    benchmark::DoNotOptimize(i++);
}
BENCHMARK(BM_scalar_increment);
static void BM_yield(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("yield"::);
}
BENCHMARK(BM_yield);
static void BM_isb(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("isb"::);
}
BENCHMARK(BM_isb);
BENCHMARK_MAIN();

$ g++ -o run a.cc -O2 -lbenchmark -lpthread
$ ./run

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------

AWS Graviton2 (Neoverse-N1) processor:
BM_scalar_increment      0.485 ns        0.485 ns   1000000000
BM_yield                 0.400 ns        0.400 ns   1000000000
BM_isb                    13.2 ns         13.2 ns     52993304

AWS Graviton (A-72) processor:
BM_scalar_increment      0.897 ns        0.874 ns    801558633
BM_yield                 0.877 ns        0.875 ns    800002377
BM_isb                    13.0 ns         12.7 ns     55169412

Apple Arm64 M1 processor:
BM_scalar_increment      0.315 ns        0.315 ns   1000000000
BM_yield                 0.313 ns        0.313 ns   1000000000
BM_isb                    9.06 ns         9.06 ns     77259282

static void BM_pause(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("pause"::);
}
BENCHMARK(BM_pause);

Intel Skylake processor:
BM_scalar_increment      0.295 ns        0.295 ns   1000000000
BM_pause                  41.7 ns         41.7 ns     16780553

Tested on Graviton2 aarch64-linux with `./x.py test`.
Reuse `sys::unix::cmath` on other platforms

Reuse `sys::unix::cmath` on all non-`windows` platforms.

`unix` is chosen as the canonical location instead of `unsupported` or `common` because `unsupported` doesn't make sense semantically and `common` is reserved for code that is supported on all platforms. Also `unix` is already the home of some non-`windows` code that is technically not exclusive to `unix` like `unix::path`.
Be stricter about rejecting LLVM reserved registers in asm!

LLVM will silently produce incorrect code if these registers are used as operands.

cc ````@rust-lang/wg-inline-asm````
[Arm64] use isb instruction instead of yield in spin loops

On arm64 we have seen on several databases that ISB (instruction synchronization
barrier) is better to use than yield in a spin loop.  The yield instruction is a
nop.  The isb instruction puts the processor to sleep for some short time.  isb
is a good equivalent to the pause instruction on x86.

Below is an experiment that shows the effects of yield and isb on Arm64 and the
time of a pause instruction on x86 Intel processors.  The micro-benchmarks use
https://github.com/google/benchmark.git

```
$ cat a.cc
static void BM_scalar_increment(benchmark::State& state) {
  int i = 0;
  for (auto _ : state)
    benchmark::DoNotOptimize(i++);
}
BENCHMARK(BM_scalar_increment);
static void BM_yield(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("yield"::);
}
BENCHMARK(BM_yield);
static void BM_isb(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("isb"::);
}
BENCHMARK(BM_isb);
BENCHMARK_MAIN();

$ g++ -o run a.cc -O2 -lbenchmark -lpthread
$ ./run

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------

AWS Graviton2 (Neoverse-N1) processor:
BM_scalar_increment      0.485 ns        0.485 ns   1000000000
BM_yield                 0.400 ns        0.400 ns   1000000000
BM_isb                    13.2 ns         13.2 ns     52993304

AWS Graviton (A-72) processor:
BM_scalar_increment      0.897 ns        0.874 ns    801558633
BM_yield                 0.877 ns        0.875 ns    800002377
BM_isb                    13.0 ns         12.7 ns     55169412

Apple Arm64 M1 processor:
BM_scalar_increment      0.315 ns        0.315 ns   1000000000
BM_yield                 0.313 ns        0.313 ns   1000000000
BM_isb                    9.06 ns         9.06 ns     77259282
```

```
static void BM_pause(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("pause"::);
}
BENCHMARK(BM_pause);

Intel Skylake processor:
BM_scalar_increment      0.295 ns        0.295 ns   1000000000
BM_pause                  41.7 ns         41.7 ns     16780553
```

Tested on Graviton2 aarch64-linux with `./x.py test`.
@rustbot rustbot added the rollup A PR which is a rollup label Apr 30, 2021
@Dylan-DPC-zz
Copy link
Author

@bors r+ rollup=never p=5

@bors
Copy link
Collaborator

bors commented Apr 30, 2021

📌 Commit 7a1462f has been approved by Dylan-DPC

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Apr 30, 2021
@bors
Copy link
Collaborator

bors commented Apr 30, 2021

⌛ Testing commit 7a1462f with merge 4a07567b863f1b1bf0d2d7341f366ce6778ba8e4...

@rust-log-analyzer
Copy link
Collaborator

The job dist-various-2 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[RUSTC-TIMING] gimli test:false 5.402
[RUSTC-TIMING] object test:false 10.260
warning: dropping unsupported crate type `dylib` for target `x86_64-fortanix-unknown-sgx`

error: invalid register `rbx`: rbx is used internally by LLVM and cannot be used as an operand for inline asm
  --> library/std/src/sys/sgx/ext/arch.rs:37:13
   |
37 |             in("rbx") request,


error: invalid register `rbx`: rbx is used internally by LLVM and cannot be used as an operand for inline asm
  --> library/std/src/sys/sgx/ext/arch.rs:65:13
   |
65 |             in("rbx") targetinfo,

error: aborting due to 2 previous errors; 1 warning emitted

[RUSTC-TIMING] std test:false 1.859
[RUSTC-TIMING] std test:false 1.859
error: could not compile `std`

To learn more, run the command again with --verbose.
command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-fortanix-unknown-sgx" "-Zbinary-dep-depinfo" "-j" "16" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace compiler-builtins-c" "--manifest-path" "/checkout/library/test/Cargo.toml" "--message-format" "json-render-diagnostics"
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap dist --host= --target x86_64-fuchsia,aarch64-fuchsia,wasm32-unknown-unknown,wasm32-wasi,sparcv9-sun-solaris,x86_64-pc-solaris,x86_64-unknown-linux-gnux32,x86_64-fortanix-unknown-sgx,nvptx64-nvidia-cuda,armv7-unknown-linux-gnueabi,armv7-unknown-linux-musleabi,i686-unknown-freebsd
Build completed unsuccessfully in 0:24:28

@bors
Copy link
Collaborator

bors commented Apr 30, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 30, 2021
@jackh726 jackh726 closed this Apr 30, 2021
@Dylan-DPC-zz Dylan-DPC-zz deleted the rollup-6182psa branch April 30, 2021 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rollup A PR which is a rollup S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants