C++: Add reverse reads to IR field flow #3419

MathiasVP · 2020-05-05T21:48:19Z

This PR adds support for reverse read field flow in the IR instantiation of the shared dataflow library. It does this by adding two new dataflow nodes, which adds "source code"-like structure to field lookups.

The previous iterations of this PR hasn't had any performance problems, but I've started a new CPP-differences anwyay to be safe: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1159/

…taflow

rdmarsh2 · 2020-06-01T21:07:46Z

I'm finding the parent/child and beginning/end terminology in the store chain to be unintuitive, and I can't find a written explanation of what exactly a reverse read is anywhere. If I'm understanding the store side correctly, for each instruction which stores to the result of a FieldAddressInstruction, there's a single store step, which goes from the StoreInstruction or SideEffectInstruction that writes to the innermost field to the StoreNode for that field. Then there's a set of read steps going from the StoreNode for each field to the StoreNode for the next field in, as well as one from the total operand of the ChiInstruction to the StoreNode for the outermost field, if a ChiInstruction is present. There's also a flow step from the StoreNode of the outermost field to the ChiInstruction if present, or the StoreInstruction if not. The net effect of all this is that when the data flow library looks for a matching access path for a read step that the ChiInstruction or StoreInstruction flows to, each StoreNode will add its field to the access path that resulted from the instruction which did the store (or remove it from the access path that caused the search?). Is that accurate?

MathiasVP · 2020-06-02T09:15:46Z

I'm finding the parent/child and beginning/end terminology in the store chain to
be unintuitive, and I can't find a written explanation of what exactly a reverse
read is anywhere.

Good point. I'll add some more comments about what a reverse read is. I guess it
makes sense to add it to the QLDoc for the StoreNode class.

If I'm understanding the store side correctly, for each
instruction which stores to the result of a FieldAddressInstruction, there's a
single store step, which goes from the StoreInstruction or SideEffectInstruction
that writes to the innermost field to the StoreNode for that field.

Correct. For a.b.c = x there's a storeStep from the StoreInstruction to
the StoreNode with the store chain c.

Then there's a set of read steps going from the StoreNode for each field to the
StoreNode for the next field in, as well as one from the total operand of the
ChiInstruction to the StoreNode for the outermost field, if a ChiInstruction is
present.

Correct. For a.b.c = x there's a read step from the total operand of a
ChiInstruction to b, and from b to c.

There's also a flow step from the StoreNode of the outermost field to the
ChiInstruction if present, or the StoreInstruction if not.

Correct. In the a.b.c = x example the StoreNode b has dataflow to the
ChiInstruction if present, and to the StoreInstruction if not.

The net effect of all this is that when the data flow library looks for a
matching access path for a read step that the ChiInstruction or StoreInstruction
flows to, each StoreNode will add its field to the access path that resulted
from the instruction which did the store (or remove it from the access path that
caused the search?). Is that accurate?

Yes, that is accurate. In a.b.c = x the pre update node for the StoreNode for
c is b, and the pre update node for the StoreNode b is the total chi
operand, and since there's a read step from the total chi operand to b, the
field b is added to the access path.

…edicates. I also simplified the code a bit by moving common implementations of predicates into shared super classes. Finally, I added a getLocation predicate to StoreNode to match the structure of the LoadNode class.

rdmarsh2

LGTM

rdmarsh2 · 2020-06-02T20:15:06Z

Actually, I just went back and looked at the performance numbers in the difference job - the 5% slowdown on Linux looks pretty bad, but I think it's partly noise (42% of the slowdown was from TaintedPath.ql, but most of the rest is from unrelated queries). I've triggered a rebuild: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1169/

MathiasVP · 2020-06-03T06:46:17Z

Actually, I just went back and looked at the performance numbers in the difference job - the 5% slowdown on Linux looks pretty bad, but I think it's partly noise (42% of the slowdown was from TaintedPath.ql, but most of the rest is from unrelated queries). I've triggered a rebuild: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1169/

Thanks for rebuilding! It looks like the Linux slowdown was mostly noise: the slowdown is down to 1.7% now. The slowdown is still mostly in cpp/path-injection, though. I'll investigate this.

Edit: I've pushed a tiny change that should improve performance (but probably not much). For some reason I hadn't restricted the side effect column of StoreChainEndInstructionSideEffect to be write side effects only. I've started a new CPP-differences to check whether this has the intended effect: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1171/

…o be WriteSideEffectInstructions

MathiasVP · 2020-06-03T22:31:22Z

Edit: I've pushed a tiny change that should improve performance (but probably not much). For some reason I hadn't restricted the side effect column of StoreChainEndInstructionSideEffect to be write side effects only. I've started a new CPP-differences to check whether this has the intended effect: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1171/

Looks like the performance improvement didn't have any negative consequences (although it didn't have any noticeable positive effects either).

MathiasVP · 2020-06-08T09:33:54Z

Since this PR is still open I decided to fix a low-hanging fruit that was waiting for #3123 to be merged.

There was unfortunately a semantic merge conflict between github#3419 and github#3587 that caused a performance regression on (at least) OpenJDK. This reverts commit 982fb38, reversing changes made to b841cac.

C++: Revert #3419 to fix OpenJDK performance

MathiasVP added C++ WIP This is a work-in-progress, do not merge yet! labels May 5, 2020

MathiasVP force-pushed the flat-structs branch 2 times, most recently from edd1f87 to 609b939 Compare May 7, 2020 09:09

C++: Add testcases for partial definitions with long access paths

335baae

MathiasVP force-pushed the flat-structs branch from 7a6a0bd to bcdec65 Compare May 29, 2020 11:52

C++: Add LoadChain and StoreChain nodes to handle reverse reads in da…

a060369

…taflow

MathiasVP force-pushed the flat-structs branch from bcdec65 to e488095 Compare May 29, 2020 11:54

MathiasVP removed the WIP This is a work-in-progress, do not merge yet! label May 29, 2020

MathiasVP marked this pull request as ready for review May 29, 2020 11:57

MathiasVP requested a review from a team as a code owner May 29, 2020 11:57

MathiasVP force-pushed the flat-structs branch from e488095 to 601f957 Compare May 29, 2020 13:22

C++: Accept tests

3adc10f

MathiasVP force-pushed the flat-structs branch from 601f957 to 3adc10f Compare May 29, 2020 13:34

rdmarsh2 previously approved these changes Jun 2, 2020

View reviewed changes

C++: Restrict the side effect of StoreChainEndInstructionSideEffect t…

b890b16

…o be WriteSideEffectInstructions

MathiasVP dismissed rdmarsh2’s stale review via b890b16 June 3, 2020 07:28

MathiasVP added 2 commits June 3, 2020 15:11

Merge branch 'master' into flat-structs

43a0d4c

C++: Accept tests after merge from master

d295e21

MathiasVP added 5 commits June 4, 2020 10:52

Merge branch 'master' into flat-structs

2cf9bce

C++: Fix testcases after merge from master

4b16067

C++: Add example demonstrating missing flow

a4388e9

C++: Add ReadSideEffect as a possible end instruction for load chains

01f3793

C++: Fix inconsistent class name

431cc5c

C++: Accept tests

b48168f

jbj assigned rdmarsh2 Jun 8, 2020

rdmarsh2 approved these changes Jun 10, 2020

View reviewed changes

rdmarsh2 merged commit 982fb38 into github:master Jun 10, 2020

MathiasVP mentioned this pull request Jun 12, 2020

C++: Field flow through conflated ChiInstructions #3670

Closed

jbj mentioned this pull request Jun 22, 2020

C++: Revert #3419 to fix OpenJDK performance #3754

Merged

MathiasVP added a commit that referenced this pull request Jun 23, 2020

Merge pull request #3754 from jbj/revert-flat-structs

55ce5ce

C++: Revert #3419 to fix OpenJDK performance

MathiasVP mentioned this pull request Sep 25, 2020

C++: Add dataflow through deep struct indirections #4350

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Add reverse reads to IR field flow #3419

C++: Add reverse reads to IR field flow #3419

Uh oh!

MathiasVP commented May 5, 2020 •

edited

Loading

Uh oh!

rdmarsh2 commented Jun 1, 2020

Uh oh!

MathiasVP commented Jun 2, 2020 •

edited

Loading

Uh oh!

rdmarsh2 left a comment

Uh oh!

rdmarsh2 commented Jun 2, 2020

Uh oh!

MathiasVP commented Jun 3, 2020 •

edited

Loading

Uh oh!

MathiasVP commented Jun 3, 2020

Uh oh!

MathiasVP commented Jun 8, 2020

Uh oh!

Uh oh!

C++: Add reverse reads to IR field flow #3419

C++: Add reverse reads to IR field flow #3419

Uh oh!

Conversation

MathiasVP commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdmarsh2 commented Jun 1, 2020

Uh oh!

MathiasVP commented Jun 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdmarsh2 left a comment

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 commented Jun 2, 2020

Uh oh!

MathiasVP commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MathiasVP commented Jun 3, 2020

Uh oh!

MathiasVP commented Jun 8, 2020

Uh oh!

Uh oh!

MathiasVP commented May 5, 2020 •

edited

Loading

MathiasVP commented Jun 2, 2020 •

edited

Loading

MathiasVP commented Jun 3, 2020 •

edited

Loading