Skip to content

Conversation

@natemcintosh
Copy link
Contributor

I finally learned how to make a pull request, based on this issue

Here I copied in the code that rdeits originally wrote here

I also updated the tests to use testsets, which provides a slightly nicer user interface when testing.

I also ran the Julia formatter on the files I edited, which made some whitespace changes. Unfortunately github's differ isn't the best at figuring out what whitespace changed, so it might look a little confusing

@codecov
Copy link

codecov bot commented Jul 29, 2022

Codecov Report

Base: 96.85% // Head: 96.97% // Increases project coverage by +0.12% 🎉

Coverage data is based on head (8852542) compared to base (d1b633b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #122      +/-   ##
==========================================
+ Coverage   96.85%   96.97%   +0.12%     
==========================================
  Files           7        7              
  Lines         700      728      +28     
==========================================
+ Hits          678      706      +28     
  Misses         22       22              
Impacted Files Coverage Δ
src/permutations.jl 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@natemcintosh
Copy link
Contributor Author

Before

@benchmark foreach(identity, Combinatorics.permutations(1:2000, 2))
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 10.268 s (30.58% GC) to evaluate,
 with a memory estimate of 60.47 GiB, over 11994001 allocations.

and after

@benchmark foreach(identity, Combinatorics.permutations(1:2000, 2))
BenchmarkTools.Trial: 39 samples with 1 evaluation.
 Range (min  max):  128.003 ms  132.626 ms  ┊ GC (min  max): 8.80%  9.58%
 Time  (median):     130.014 ms               ┊ GC (median):    9.60%
 Time  (mean ± σ):   130.239 ms ±   1.225 ms  ┊ GC (mean ± σ):  9.38% ± 0.41%

  █              █▃█   ▃  ▃  ▃              █   █                
  █▁▁▁▁▁▇▁▁▁▁▁▁▁▁███▁▇▁█▁▇█▇▇█▁▁▇▁▁▁▇▁▁▇▁▁▁▇█▇▁▁█▇▇▁▇▁▇▁▁▇▁▁▁▁▇ ▁
  128 ms           Histogram: frequency by time          133 ms <

 Memory estimate: 427.03 MiB, allocs estimate: 7996001.

@bkamins
Copy link
Contributor

bkamins commented Aug 1, 2022

CC @mschauer @rdeits

Co-authored-by: Bogumił Kamiński <[email protected]>
@bkamins
Copy link
Contributor

bkamins commented Aug 8, 2022

Does anyone know of a place I can more run 1.0 to check out what's happening?

I normally just download binaries from https://julialang.org/downloads/oldreleases/ and install it manually when needed.

@bkamins
Copy link
Contributor

bkamins commented Aug 25, 2022

OK - looks good. Now someone not involved in development of the code should review it. (thank you in advance)

@natemcintosh
Copy link
Contributor Author

Would it be worth posting on discourse to find someone willing to review? I can definitely do that

@mschauer
Copy link
Member

mschauer commented Oct 3, 2022

Often it is also helpful if you go on record and state if you think yourself that this is ready to be merged and let loose on the world.

@natemcintosh
Copy link
Contributor Author

Hi @mschauer, good idea. I have made a thorough review of it (now over a month since I last looked at it), as well as spent more time playing around with the new version. I believe the new version is reliable, much faster (the reason for this pull request), and ready for public use.

@bkamins
Copy link
Contributor

bkamins commented Oct 3, 2022

@mschauer - can you please review it? I was involved in the implementation so a second set of eyes is required.
(or who else from JuliaMath could be recommended?)

Base.eltype(::Type{Permutations{T}}) where {T} = Vector{eltype(T)}
function has_repeats(state::Vector{Int})
for outer in eachindex(state)
for inner in (outer+1):lastindex(state)
Copy link
Member

@jishnub jishnub Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner loop can be marked @inbounds safely, I think. Actually, so can the outer I guess

julia> v = [1:10000;];

julia> @btime has_repeats($v);
  31.500 ms (0 allocations: 0 bytes)

julia> function has_repeats(state::Vector{Int})
           @inbounds for outer in eachindex(state)
               for inner in (outer+1):lastindex(state)
                   if state[outer] == state[inner]
                       return true
                   end
               end
           end
           return false
       end
has_repeats (generic function with 1 method)

julia> @btime has_repeats($v);
  21.080 ms (0 allocations: 0 bytes)

Admittedly, this is unnecessary if the compiler can infer this automatically, but apparently it doesn't do a good job on some platforms.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner can be marked @inbounds safely because of the type restriction in the signature. It might be good to note that in a comment in case that type restriction is ever loosened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input! I've added the inbounds, and a comment explaining why it is safe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Now someone from JuliaMath needs to do a final review, merge, and release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mschauer, would you be willing to do the final review, merge, and release? I have reviewed it myself, and believe it is ready for the main branch.

from palday and jishnub
@ararslan
Copy link
Member

Why change the name of the type from Permutations to PermutationIterator? The new name is inconsistent with how other types in this package are named, e.g. Combinations and MultiSetCombinations. Also, though not exported under the old name, other packages may be defining methods for that type.

@natemcintosh
Copy link
Contributor Author

Why change the name of the type from Permutations to PermutationIterator? The new name is inconsistent with how other types in this package are named, e.g. Combinations and MultiSetCombinations. Also, though not exported under the old name, other packages may be defining methods for that type.

Good point. I'll fix that

Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here 🙂


function Base.length(p::Permutations)
length(p.data) < p.length && return 0
return Int(prod(big(length(p.data) - p.length + 1):big(length(p.data))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why convert to BigInt then back to Int? If the product exceeds typemax(Int), the conversion will error anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Would you suggest leaving it as a BigInt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. What is the return type of length(::Combinations)? I'd probably opt to make it consistent with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like length(::Combinations) returns an Int64. Should the length function be worried about overflow in these cases? It's not too hard create a case that overflows and returns an error with Int64 for combinations and for permutations. But perhaps that's outside the scope of this PR.

I'm ok with having length(::Permutations) not converting to BigInt in the first place if we think that's most consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation would be to have it be consistent with Combinations for now and open an issue to discuss whether the length function should return BigInt. Then both can be changed at once later.

Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work and thanks for the contribution!

@ararslan ararslan merged commit 8a7a8ac into JuliaMath:master Feb 1, 2023
@natemcintosh natemcintosh mentioned this pull request Aug 23, 2023
@thchr
Copy link
Contributor

thchr commented Nov 7, 2024

This made permutations type-unstable, returning a Vector{T} or Permutations{T} depending on the value of t in permutations(a::T, t). Was that necessary for the performance gains?

Cf.:

julia> @code_warntype permutations([1,2,3])
MethodInstance for permutations(::Vector{Int64})
  from permutations(a) @ Combinatorics ~/.julia/dev/Combinatorics/src/permutations.jl:73
Arguments
  #self#::Core.Const(Combinatorics.permutations)
  a::Vector{Int64}
Body::Union{Combinatorics.Permutations{Vector{Int64}}, Vector{Vector{Int64}}}
1%1 = Combinatorics.permutations::Core.Const(Combinatorics.permutations)
│   %2 = Combinatorics.length::Core.Const(length)
│   %3 = (%2)(a)::Int64%4 = (%1)(a, %3)::Union{Combinatorics.Permutations{Vector{Int64}}, Vector{Vector{Int64}}}
└──      return %4

@natemcintosh
Copy link
Contributor Author

Hi @thchr, it's been a while since I worked on this, so my memory is a little fuzzy. If I remember correctly, it returns Vector{T} when t was 0 or 1. I think this made it faster, but I don't remember for sure.

It might be worth running some tests to remind us if that was indeed the case.

@thchr
Copy link
Contributor

thchr commented Nov 7, 2024

I think the main problem with that approach - although it may be faster for specific cases of t=0 and t=1 - is that it makes the general case type-unstable.

So, if you test "the whole" thing (calling permutations and then iterating over it), it becomes slower since Julia doesn't know whether to iterate over a Vector{T} or a Permutations{T} ahead of time.

E.g., before this PR:

julia> f(xs) = collect(permutations(xs))
julia> @btime f(1:3);
  367.308 ns (20 allocations: 1.30 KiB)

And, after:

julia> @btime f(1:3);
  432.668 ns (17 allocations: 704 bytes)

This occurs even though iterate(::Permutation, ...) was sped up dramatically by this PR.
I think the best solution is to drop the special-casing of the t=0 and t=1 cases. Even though those cases might be sped up, they slow down all other cases.

EDIT: Actually, the slow down doesn't appear related to type-instability. We can remove the type-instability by just by-passing the permutations constructor for Combinatorics.Permutations, and we get about the same time:

julia> g(xs) = collect(Combinatorics.Permutations(xs, length(xs)))
julia> @btime g(1:3);
  428.191 ns (17 allocations: 704 bytes)

So I guess the argument for removing the special casing for t=0,1 is not quite as compelling (although still applicable, I think, for helping inference have an easier time). You can see this by comparing @code_llvm f(1:3) with @code_llvm g(1:3).
The new version here is of course much faster for larger permutation vectors, regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants