-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Cache the computation of core toString predicates for cpp c# and java. #2204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ae2dfc9
to
ff43058
Compare
Have you checked what the effect is on the cache size? I'm also curious whether we have queries that derive from AST classes and override |
ff43058
to
924d23f
Compare
The dist-compare report for C# shows no significant change in performance. |
I just did a bit of testing with some C++ queries and this PR. On a sequence of typical (real) queries it makes no noticeable difference - which I think is what we'd expect since none of them produced many result rows to 👍 |
Testing for Java on our CI infrastructure suggests that overall analysis time for the standard LGTM query suite increases by a non-negligible amount on some projects, in particular 10-15% for JDK 11. It would be good to understand why. Perhaps @aschackmull could look into it? The outliers in the profiler result were UselessParameter.ql, BoxedVariable.ql and FLinesOfCommentedCode.ql (in that order). |
Fixes for those three queries are here: #2341 |
@aschackmull, thanks for the Java fixes, which have now been merged. This has already been approved for C++ and C#, but there was a question from @jbj in #2204 (comment) without a reply, so I'll leave @jbj to decide whether/when to merge. |
I've started a benchmark on the full suite and will leave it running over night. |
I benchmarked the change on the whole suite, and all I got was a 5% slowdown. But it was on an Azure machine with a networked drive, and it was rebooted between runs, so it could just be wobble. I'll investigate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've investigated the benchmark slowdown, and I think it's benign. Most cached library stages and predicates saw a slowdown of 10% or so even though they didn't involve toString
at all. That tells me the second run happened on a machine that was 10% slower. The end-to-end slowdown for running the LGTM suite was only 5%, suggesting that this PR would have made the suite faster on the same hardware.
It took 8m9s to compute ElementBase::toString
for Chromium, which is a long time even for a big snapshot. It had 289,260,959 rows -- compare that to Expr::getType
, which has 79,588,119 rows. It's probably worth investigating whether we can make toString
faster by generating fewer unique strings or by making it less recursive, but that doesn't have to block this PR.
Some queries, at least SuspiciousAddWithSizeof.ql
, scan through all strings and even do concatenations to produce the alert message, before they join with the where
clause. That seems to have become much slower after this PR. It could be due to the slow I/O of the profiling machine, and it can be investigated independently of this PR.
There was no re-evaluation of the cached toString
in the LGTM suite.
This improves the performance of running many queries when
toString
s of results are computed.