Cache the computation of core toString predicates for cpp c# and java. #2204

alexet · 2019-10-25T13:35:11Z

This improves the performance of running many queries when toStrings of results are computed.

jbj · 2019-10-25T13:47:55Z

Have you checked what the effect is on the cache size? I'm also curious whether we have queries that derive from AST classes and override toString, thereby invalidating the cache. I've certainly seen that in the past.

hvitved · 2019-10-30T11:58:13Z

The dist-compare report for C# shows no significant change in performance.

geoffw0 · 2019-11-07T17:02:24Z

I just did a bit of testing with some C++ queries and this PR. On a sequence of typical (real) queries it makes no noticeable difference - which I think is what we'd expect since none of them produced many result rows to toString. However on a short sequence of (artificial) queries with lots of results this made a big improvement to performance after the first query. I think this is what we're expecting.

👍

yh-semmle · 2019-11-07T18:35:06Z

This improves the performance of running many queries when toStrings of results are computed.

Testing for Java on our CI infrastructure suggests that overall analysis time for the standard LGTM query suite increases by a non-negligible amount on some projects, in particular 10-15% for JDK 11. It would be good to understand why. Perhaps @aschackmull could look into it? The outliers in the profiler result were UselessParameter.ql, BoxedVariable.ql and FLinesOfCommentedCode.ql (in that order).

aschackmull · 2019-11-15T10:15:39Z

Testing for Java on our CI infrastructure suggests that overall analysis time for the standard LGTM query suite increases by a non-negligible amount on some projects, in particular 10-15% for JDK 11. It would be good to understand why. Perhaps @aschackmull could look into it? The outliers in the profiler result were UselessParameter.ql, BoxedVariable.ql and FLinesOfCommentedCode.ql (in that order).

Fixes for those three queries are here: #2341

yo-h · 2019-12-17T03:11:46Z

@aschackmull, thanks for the Java fixes, which have now been merged.

This has already been approved for C++ and C#, but there was a question from @jbj in #2204 (comment) without a reply, so I'll leave @jbj to decide whether/when to merge.

jbj · 2019-12-18T09:00:09Z

I've started a benchmark on the full suite and will leave it running over night.

jbj · 2019-12-19T12:55:34Z

I benchmarked the change on the whole suite, and all I got was a 5% slowdown. But it was on an Azure machine with a networked drive, and it was rebooted between runs, so it could just be wobble. I'll investigate.

jbj

I've investigated the benchmark slowdown, and I think it's benign. Most cached library stages and predicates saw a slowdown of 10% or so even though they didn't involve toString at all. That tells me the second run happened on a machine that was 10% slower. The end-to-end slowdown for running the LGTM suite was only 5%, suggesting that this PR would have made the suite faster on the same hardware.

It took 8m9s to compute ElementBase::toString for Chromium, which is a long time even for a big snapshot. It had 289,260,959 rows -- compare that to Expr::getType, which has 79,588,119 rows. It's probably worth investigating whether we can make toString faster by generating fewer unique strings or by making it less recursive, but that doesn't have to block this PR.

Some queries, at least SuspiciousAddWithSizeof.ql, scan through all strings and even do concatenations to produce the alert message, before they join with the where clause. That seems to have become much slower after this PR. It could be due to the slow I/O of the profiling machine, and it can be investigated independently of this PR.

There was no re-evaluation of the cached toString in the LGTM suite.

alexet requested review from a team as code owners October 25, 2019 13:35

alexet force-pushed the cache-to-string branch from ae2dfc9 to ff43058 Compare October 25, 2019 13:38

jbj added C# C++ Java labels Oct 25, 2019

aschackmull previously approved these changes Oct 25, 2019

View reviewed changes

Cache the computation of core toString predicates.

924d23f

alexet dismissed aschackmull’s stale review via 924d23f October 29, 2019 14:48

alexet force-pushed the cache-to-string branch from ff43058 to 924d23f Compare October 29, 2019 14:48

hvitved approved these changes Oct 30, 2019

View reviewed changes

geoffw0 approved these changes Nov 7, 2019

View reviewed changes

aschackmull mentioned this pull request Nov 15, 2019

Java: Fix a number of performance issues when toString is cached. #2341

Merged

yo-h approved these changes Dec 17, 2019

View reviewed changes

jbj approved these changes Dec 20, 2019

View reviewed changes

jbj merged commit de55a68 into github:master Dec 20, 2019

jbj mentioned this pull request Apr 23, 2020

C++: inline arithTypesMatch predicate #3316

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache the computation of core toString predicates for cpp c# and java. #2204

Cache the computation of core toString predicates for cpp c# and java. #2204

Uh oh!

alexet commented Oct 25, 2019

Uh oh!

jbj commented Oct 25, 2019

Uh oh!

hvitved commented Oct 30, 2019

Uh oh!

geoffw0 commented Nov 7, 2019

Uh oh!

yh-semmle commented Nov 7, 2019

Uh oh!

aschackmull commented Nov 15, 2019

Uh oh!

yo-h commented Dec 17, 2019

Uh oh!

jbj commented Dec 18, 2019

Uh oh!

jbj commented Dec 19, 2019

Uh oh!

jbj left a comment

Uh oh!

Uh oh!

Cache the computation of core toString predicates for cpp c# and java. #2204

Cache the computation of core toString predicates for cpp c# and java. #2204

Uh oh!

Conversation

alexet commented Oct 25, 2019

Uh oh!

jbj commented Oct 25, 2019

Uh oh!

hvitved commented Oct 30, 2019

Uh oh!

geoffw0 commented Nov 7, 2019

Uh oh!

yh-semmle commented Nov 7, 2019

Uh oh!

aschackmull commented Nov 15, 2019

Uh oh!

yo-h commented Dec 17, 2019

Uh oh!

jbj commented Dec 18, 2019

Uh oh!

jbj commented Dec 19, 2019

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!