Skip to content

Conversation

lukaslueg
Copy link
Contributor

@lukaslueg lukaslueg commented Aug 30, 2022

This builds upon #9373 to use the approximate size of each variant for large_enum_variant. This allows us to lint in situations where an enum contains generics but is still guaranteed to have a large variant on an at-least basis, e.g. with (T, [u8; 512]).

  • I've changed the wording from "is ... bytes" to "contains at least" because
    • the size is now an approximate lower bound (e.g. 512 in the example above). The actual size is larger due to T, including due to T's memory layout.
    • the discriminant is not taken into account in the message. This comes up with variants like A(T), which are "is at least 0 bytes" otherwise, which may be misleading.
  • If the second-largest variant has no fields, there is a special case "carries no data" instead of "is at least 0 bytes".
  • A variant like A(T) is "at least 0 bytes", which is technically true, yet we don't distinguish between "indeterminate" and truly "ZST".
  • The generics-tests that were there before now lint while they didn't lint before. AFAICS this is correct.

I guess the above is correct-ish. However, I use the SubstsRef that I got via cx.tcx.type_of(item.def_id) to solve for generics in the variants. Is this even applicable, since we start from an - [ ] ItemKind?

changelog: none

@rust-highfive
Copy link

r? @llogiq

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Aug 30, 2022
@llogiq
Copy link
Contributor

llogiq commented Aug 30, 2022

I'm just a bit worried that the lint may become really noisy with this change.

Anyone up for running lintcheck against this branch?

Otherwise this looks good to go.

@lukaslueg
Copy link
Contributor Author

cargo lintcheck comes up empty wrt large_enum_variant on this branch (as on master).

I'm a little worried about noise as well. Consider enum Foo<const U: usize> { foo: [u8; 200], bar: [u8; U] }, which would always lint because bar is considered to be at least 0 bytes, making it threshold-bytes smaller than foo. One can argue that this is not wrong, though, because as far as Foo is concerned there is a suspiciously out-of-whack-large variant.

If this is gtg (please review my question wrt SubstsRef above), I'll need to update the lint description before merging.

@Jarcho
Copy link
Contributor

Jarcho commented Aug 31, 2022

Running this on a few thousand crates. I'll post the results tonight.

@Jarcho
Copy link
Contributor

Jarcho commented Sep 1, 2022

Results from roughly the top 5000 crates: https://gist.github.com/Jarcho/57c0e4927769c044b0a56544d429027a

Don't know how many of these have generic types.

@lukaslueg
Copy link
Contributor Author

Only 42 lints, imho all valid or at least debatable. I'm surprised the number is so low.

This PR clearly helps with

warning: large size difference between variants
   --> src/iterator.rs:324:4
    |
324 |             Descend(Result<(OwnedNode<DBValue>, Option<O>), O, E>),
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this variant contains at least 464 bytes
    |
    = note: requested on the command line with `-W clippy::large-enum-variant`
note: and the second-largest variant carries no data at all
   --> src/iterator.rs:321:4
    |
321 |             YieldNode,
    |             ^^^^^^^^^

Unrelated to this PR:

memcache-0.16.0: 1 warnings

warning: large size difference between variants
  --> src/protocol/mod.rs:14:1
   |
14 | #[enum_dispatch]
   | ^^^^^^^^^^^^^^^^ this variant contains at least 2120 bytes
   |
   = note: requested on the command line with `-W clippy::large-enum-variant`
note: and the second-largest variant contains at least 64 bytes:
  --> src/protocol/mod.rs:14:1
   |
14 | #[enum_dispatch]
   | ^^^^^^^^^^^^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#large_enum_variant
   = note: this warning originates in the attribute macro `enum_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info

... which comes from

#[enum_dispatch]
pub enum Protocol {
    Ascii(AsciiProtocol<Stream>),
    Binary(BinaryProtocol),
}

@lukaslueg lukaslueg changed the title [WIP] Use approx_ty_size for large_enum_variant Use approx_ty_size for large_enum_variant Sep 1, 2022
@lukaslueg
Copy link
Contributor Author

Force-pushed a new version, the lint is now

error: large size difference between variants
  --> $DIR/large_enum_variant.rs:120:1
   |
LL | / enum SomeGenericPossiblyCopyEnum<T> {
LL | |     A(bool, std::marker::PhantomData<T>),
   | |     ------------------------------------ the second-largest variant contains at least 1 bytes
LL | |     B([u64; 4000]),
   | |     -------------- the largest variant contains at least 32000 bytes
LL | | }
   | |_^ the entire enum is at least 32008 bytes
   |
note: boxing a variant would require the type no longer be `Copy`
  --> $DIR/large_enum_variant.rs:120:6
   |
LL | enum SomeGenericPossiblyCopyEnum<T> {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: consider boxing the large fields to reduce the total size of the enum
  --> $DIR/large_enum_variant.rs:122:5
   |
LL |     B([u64; 4000]),
   |     ^^^^^^^^^^^^^^

which is better imho.

LGTM now, happy for comments.

I may get to the #[enum_dispatch]-problem in a separate PR. I may also add some tooling to approx_ty_size to discern between "at least {} bytes" and "exactly {} bytes", so one can special-case "indeterminate" from "at least 0 bytes".

@llogiq
Copy link
Contributor

llogiq commented Sep 1, 2022

Even if that number is low, for a warn by default lint we should have quite high standards. I'm not opposed to merging this, but I don't feel comfortable r+ing this alone without further discussion.

What do you folks think @rust-lang/clippy ?

@llogiq llogiq added the S-needs-discussion Status: Needs further discussion before merging or work can be started label Sep 1, 2022
@Manishearth
Copy link
Member

We already warn on non generic enums, right? I think this ought to be fine?

@Jarcho
Copy link
Contributor

Jarcho commented Sep 1, 2022

The old behaviour is to ignore variants with generics correct? That means the false positives are for:

  • People who don't care (not a new group)
  • The distribution of created variants is biased towards the large variant (not a new problem)
  • The generics actually substituted would even out the sizes (I think this is a new class of false positives)

I would think for anyone who cares about this lint, a comment explaining why a certain case isn't a problem is reasonable. For everyone else the lint would be an annoyance even before this PR.

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2022

I have suggested a workable explanation of the possible problems to the lint docs. r=me with this applied.

@lukaslueg
Copy link
Contributor Author

@llogiq Included the comment wrt false positives. Rebased to include ad72aee

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2022

Thank you!

@bors r+

@bors
Copy link
Contributor

bors commented Sep 2, 2022

📌 Commit 584000a has been approved by llogiq

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Sep 2, 2022

⌛ Testing commit 584000a with merge c8ceb58...

bors added a commit that referenced this pull request Sep 2, 2022
Use `approx_ty_size` for `large_enum_variant`

This builds upon #9373 to use the approximate size of each variant for `large_enum_variant`. This allows us to lint in situations where an `enum` contains generics but is still guaranteed to have a large variant on an at-least basis, e.g. with `(T, [u8; 512])`.

* I've changed the wording from "is ... bytes" to "contains at least" because
  * the size is now an approximate lower bound (e.g. `512` in the example above). The actual size is larger due to `T`, including due to `T`'s memory layout.
  * the discriminant is not taken into account in the message. This comes up with variants like `A(T)`, which are "is at least 0 bytes" otherwise, which may be misleading.
* If the second-largest variant has no fields, there is a special case "carries no data" instead of "is at least 0 bytes".
* A variant like `A(T)` is "at least 0 bytes", which is technically true, yet we don't distinguish between "indeterminate" and truly "ZST".
* The generics-tests that were there before now lint while they didn't lint before. AFAICS this is correct.

I guess the above is correct-ish. However, I use the `SubstsRef` that I got via `cx.tcx.type_of(item.def_id)` to solve for generics in the variants. Is this even applicable, since we start from an `ItemKind`?
@bors
Copy link
Contributor

bors commented Sep 2, 2022

💔 Test failed - checks-action_test

@llogiq
Copy link
Contributor

llogiq commented Sep 3, 2022

@bors retry

@bors
Copy link
Contributor

bors commented Sep 3, 2022

⌛ Testing commit 584000a with merge 99ab5fe...

@bors
Copy link
Contributor

bors commented Sep 3, 2022

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: llogiq
Pushing 99ab5fe to master...

@bors bors merged commit 99ab5fe into rust-lang:master Sep 3, 2022
@lukaslueg lukaslueg deleted the approx_large_enum branch September 3, 2022 09:06
@lukaslueg
Copy link
Contributor Author

@llogiq Thanks for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-needs-discussion Status: Needs further discussion before merging or work can be started S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants