Commit 9cb1ea6
This simplifies the `copyto_unalised!` implementation where the source
and destination have different `IndexStyle`s, and limits the `@inbounds`
to only the indexing operation. In particular, the iteration over
`eachindex(dest)` is not marked as `@inbounds` anymore. This seems to
help with performance when the destination uses Cartesian indexing.
Reduced implementation of the branch:
```julia
function copyto_proposed!(dest, src)
axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
iterdest, itersrc = eachindex(dest), eachindex(src)
for (destind, srcind) in zip(iterdest, itersrc)
@inbounds dest[destind] = src[srcind]
end
dest
end
function copyto_current!(dest, src)
axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
iterdest, itersrc = eachindex(dest), eachindex(src)
ret = iterate(iterdest)
@inbounds for a in src
idx, state = ret::NTuple{2,Any}
dest[idx] = a
ret = iterate(iterdest, state)
end
dest
end
function copyto_current_limitinbounds!(dest, src)
axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
iterdest, itersrc = eachindex(dest), eachindex(src)
ret = iterate(iterdest)
for isrc in itersrc
idx, state = ret::NTuple{2,Any}
@inbounds dest[idx] = src[isrc]
ret = iterate(iterdest, state)
end
dest
end
```
```julia
julia> a = zeros(40000,4000); b = rand(size(a)...);
julia> av = view(a, UnitRange.(axes(a))...);
julia> @Btime copyto_current!($av, $b);
617.704 ms (0 allocations: 0 bytes)
julia> @Btime copyto_current_limitinbounds!($av, $b);
304.146 ms (0 allocations: 0 bytes)
julia> @Btime copyto_proposed!($av, $b);
240.217 ms (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.12.0-DEV.1260
Commit 4a4ca9c (2024-09-28 01:49 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_EDITOR = subl
```
I'm not quite certain why the proposed implementation here
(`copyto_proposed!`) is even faster than
`copyto_current_limitinbounds!`. In any case, `copyto_proposed!` is
easier to read, so I'm not complaining.
This fixes #53158
(cherry picked from commit 06e7b9d)
1 parent 23b7de6 commit 9cb1ea6
1 file changed
+2
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1096 | 1096 | | |
1097 | 1097 | | |
1098 | 1098 | | |
1099 | | - | |
1100 | | - | |
1101 | | - | |
1102 | | - | |
1103 | | - | |
| 1099 | + | |
| 1100 | + | |
1104 | 1101 | | |
1105 | 1102 | | |
1106 | 1103 | | |
| |||
0 commit comments