Skip to content

Conversation

alexet
Copy link
Contributor

@alexet alexet commented Oct 25, 2019

This improves the performance of running many queries when toStrings of results are computed.

@jbj
Copy link
Contributor

jbj commented Oct 25, 2019

Have you checked what the effect is on the cache size? I'm also curious whether we have queries that derive from AST classes and override toString, thereby invalidating the cache. I've certainly seen that in the past.

aschackmull
aschackmull previously approved these changes Oct 25, 2019
@hvitved
Copy link
Contributor

hvitved commented Oct 30, 2019

The dist-compare report for C# shows no significant change in performance.

@geoffw0
Copy link
Contributor

geoffw0 commented Nov 7, 2019

I just did a bit of testing with some C++ queries and this PR. On a sequence of typical (real) queries it makes no noticeable difference - which I think is what we'd expect since none of them produced many result rows to toString. However on a short sequence of (artificial) queries with lots of results this made a big improvement to performance after the first query. I think this is what we're expecting.

👍

@yh-semmle
Copy link
Contributor

This improves the performance of running many queries when toStrings of results are computed.

Testing for Java on our CI infrastructure suggests that overall analysis time for the standard LGTM query suite increases by a non-negligible amount on some projects, in particular 10-15% for JDK 11. It would be good to understand why. Perhaps @aschackmull could look into it? The outliers in the profiler result were UselessParameter.ql, BoxedVariable.ql and FLinesOfCommentedCode.ql (in that order).

@aschackmull
Copy link
Contributor

Testing for Java on our CI infrastructure suggests that overall analysis time for the standard LGTM query suite increases by a non-negligible amount on some projects, in particular 10-15% for JDK 11. It would be good to understand why. Perhaps @aschackmull could look into it? The outliers in the profiler result were UselessParameter.ql, BoxedVariable.ql and FLinesOfCommentedCode.ql (in that order).

Fixes for those three queries are here: #2341

@yo-h
Copy link
Contributor

yo-h commented Dec 17, 2019

@aschackmull, thanks for the Java fixes, which have now been merged.

This has already been approved for C++ and C#, but there was a question from @jbj in #2204 (comment) without a reply, so I'll leave @jbj to decide whether/when to merge.

@jbj
Copy link
Contributor

jbj commented Dec 18, 2019

I've started a benchmark on the full suite and will leave it running over night.

@jbj
Copy link
Contributor

jbj commented Dec 19, 2019

I benchmarked the change on the whole suite, and all I got was a 5% slowdown. But it was on an Azure machine with a networked drive, and it was rebooted between runs, so it could just be wobble. I'll investigate.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've investigated the benchmark slowdown, and I think it's benign. Most cached library stages and predicates saw a slowdown of 10% or so even though they didn't involve toString at all. That tells me the second run happened on a machine that was 10% slower. The end-to-end slowdown for running the LGTM suite was only 5%, suggesting that this PR would have made the suite faster on the same hardware.

It took 8m9s to compute ElementBase::toString for Chromium, which is a long time even for a big snapshot. It had 289,260,959 rows -- compare that to Expr::getType, which has 79,588,119 rows. It's probably worth investigating whether we can make toString faster by generating fewer unique strings or by making it less recursive, but that doesn't have to block this PR.

Some queries, at least SuspiciousAddWithSizeof.ql, scan through all strings and even do concatenations to produce the alert message, before they join with the where clause. That seems to have become much slower after this PR. It could be due to the slow I/O of the profiling machine, and it can be investigated independently of this PR.

There was no re-evaluation of the cached toString in the LGTM suite.

@jbj jbj merged commit de55a68 into github:master Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants