Skip to content

Conversation

jbj
Copy link
Contributor

@jbj jbj commented May 1, 2020

From the main commit:

Flow from a definition by reference of a field into its object was working inconsistently and in a very syntax-dependent way. For a function f receiving a reference, f(a->x) could propagate data back to a via the reverse read mechanism in the shared data-flow library, but for a function g receiving a pointer, g(&a->x) would not work. And f((*a).x) would not work either.

In all cases, the issue was that the shared data-flow library propagates data backwards between PostUpdateNodes only, but there is no PostUpdateNode for a->x in g(&a->x). This pull request inserts such post-update nodes where appropriate and links them to their neighbors. In this exapmle, flow back from the output parameter of g passes first to the PostUpdateNode of &, then to the (new) PostUpdateNode of a->x, and finally, as a reverse read with the appropriate field projection, to a.

Cc @lcartey: this addresses https://github.slack.com/archives/CP0LHP150/p1587393520434500.

Cc @aschackmull: the first commit implements what you suggested in #3162 (comment). Also, the issue addressed by this PR is also present in Java data flow. If you change f(a.x) to f((X)a.x) you no longer get reverse-read flow into a because the argument (a cast) is not equal to the field read.

jbj added 3 commits May 1, 2020 11:04
Flow from a definition by reference of a field into its object was
working inconsistently and in a very syntax-dependent way. For a
function `f` receiving a reference, `f(a->x)` could propagate data back
to `a` via the _reverse read_ mechanism in the shared data-flow library,
but for a function `g` receiving a pointer, `g(&a->x)` would not work.
And `f((*a).x)` would not work either.

In all cases, the issue was that the shared data-flow library propagates
data backwards between `PostUpdateNode`s only, but there is no
`PostUpdateNode` for `a->x` in `g(&a->x)`. This pull request inserts
such post-update nodes where appropriate and links them to their
neighbors. In this exapmle, flow back from the output parameter of `g`
passes first to the `PostUpdateNode` of `&`, then to the (new)
`PostUpdateNode` of `a->x`, and finally, as a _reverse read_ with the
appropriate field projection, to `a`.
@jbj jbj added the C++ label May 1, 2020
* - AddressConstantExpression.qll
* - AddressFlow.qll
* - EscapesTree.qll
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible for these three files to share code, but I don't think it would make this PR any easier to review.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'll do a deeper investigation over the weekend. I only stumbled over one thing while quickly glancing at the changes.

@jbj
Copy link
Contributor Author

jbj commented May 1, 2020

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Any idea why there's a crazy speedup in NewArrayDeleteMismatch.ql on Wireshark?

private predicate lvalueToPointerStep(Expr lvalueIn, Expr pointerOut) {
lvalueIn.getConversion() = pointerOut.(ArrayToPointerConversion)
or
lvalueIn = pointerOut.(AddressOfExpr).getOperand().getFullyConverted()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add std::addressof to this list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It turned out to be a bit more involved than adding it to this list.

@jbj
Copy link
Contributor Author

jbj commented May 4, 2020

LGTM! Any idea why there's a crazy speedup in NewArrayDeleteMismatch.ql on Wireshark?

Apparently it's because both there's a huge slowdown both before and after this PR! A performance regression might have crept into master as a result of one of my recent "harmless" data-flow refactorings. I'll investigate.

@jbj
Copy link
Contributor Author

jbj commented May 4, 2020

I've addressed the performance problem in #3400.

jbj added 4 commits May 7, 2020 16:30
I didn't add this support in `AddressConstantExpression.qll` since I
think it would require extra work and testing to get the constexprness
right. My long-term plan for `AddressConstantExpression.qll` is to move
its functionality to the extractor.
…-defbyref-to-field

This is a partial merge from master. In particular, it takes in github#3382
and github#3385.
@jbj jbj requested a review from a team as a code owner May 7, 2020 14:53
Adding a new test case leads to changes in all `.expected` files in its
directory.

The new results show that the `DefinitionsAndUses` library does not
model `std::addressof` correctly, but that library is not intended to be
used for new code.
@jbj
Copy link
Contributor Author

jbj commented May 11, 2020

@jbj
Copy link
Contributor Author

jbj commented May 11, 2020

The CPP-Differences job has finished. The performance changes seem to be within the usual noise level.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MathiasVP MathiasVP merged commit df6abdc into github:master May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants