Skip to content

Conversation

jbj
Copy link
Contributor

@jbj jbj commented Sep 12, 2019

This PR is my attempt to address #1888 (comment).

@jbj jbj added the C++ label Sep 12, 2019
@jbj jbj requested a review from rdmarsh2 September 12, 2019 09:04
@jbj jbj requested a review from a team as a code owner September 12, 2019 09:04
@aschackmull
Copy link
Contributor

I'd appreciate if we could get #1881 merged first (I've just updated it), as it involves a submodule bump and is likely to conflict.

| taint.cpp:8:8:8:13 | clean1 | taint.cpp:4:27:4:33 | source1 |
| taint.cpp:16:8:16:14 | source1 | taint.cpp:12:22:12:27 | ... = ... |
| taint.cpp:16:8:16:14 | source1 | taint.cpp:12:22:12:27 | call to source |
| taint.cpp:17:8:17:16 | ++ ... | taint.cpp:12:22:12:27 | ... = ... |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two new lines with the source being ... = ... are new, and I don't think we want them. I also don't think that the toString changes in this PR are the cause of the problem -- they just make it visible.

The IR DataFlow::Node.asExpr and asConvertedExpr map from an Expr to an Instruction using https://github.com/Semmle/ql/blob/10076a6b2bd509ca8dc021c05c508c572cf6e957/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll#L437-L440. Notice how the QLDoc on it implies that it won't have multiple results. That predicate a wrapper for https://github.com/Semmle/ql/blob/10076a6b2bd509ca8dc021c05c508c572cf6e957/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll#L49-L54, which has only one result as long as no two expressions claim to have the same Instruction as result. Unfortunately, such expressions exist. For example, C++ Assignment expressions have their LHS as result: https://github.com/Semmle/ql/blob/10076a6b2bd509ca8dc021c05c508c572cf6e957/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/TranslatedExpr.qll#L1242-L1254.

@dave-bartolomeo, how can we fix this? What uses of getResult do we have that requires TranslatedExpr to sometimes specify that its result is an instruction that doesn't belong to itself? I've informally thought of getResult as getting the Instruction whose value is the value computed by the expression; for an assignment, that would be the Store instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do this to handle the following cases (which aren't currently included in the IR tests or, AFAICT, the syntax zoo)

void UseAssignExpr(int x) {
  int w, y, z;
  y = z = x;
  (y = z) = x;
  if (w = x) {
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what would be the problem with those test cases if the result of the = were its Store?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first and third cases, the consumers of the result of the = expect a register operand, not a memory operand. In the second case, the result of the first = is an lvalue for the left operand, that is then stored into by the second =. We could use CopyValueInstructions to preserve the current semantics while avoiding the duplicate sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see what you're saying.

The rules for the return value of an assignment are different between C and C++. To keep this simple, let's focus on C++. Let me know if I've understood this correctly. A C++ assignment expression a = b has a result that's an lvalue for the same address as a. This is consistent with overloaded assignment operators, which look like C& C::operator=(const C &rhs) { ...; return *this; }. This makes it possible to implement a class my_int { int content; ... } for which assignment behaves exactly (okay, exactly is a big word in this context) the same as the built-in assignment on int.

I'd argue that ideally the IR translation of a built-in = should work in the same way as an overloaded operator=. It'll have an lvalue as result, and then that lvalue will be Loaded if necessary (test case 1 and 3). And that seems to be how it works now.

So your suggestion is to add a CopyValueInstruction to TranslatedAssignment. For performance and clarity, we'd probably want to omit that instruction in the common case where = is in a void context.

I'm still not sure how common it is for TranslatedElements to have results that are not their own instructions. If it's common, perhaps it will bite us in DataFlow::asExpr for other kinds of expression, and we should be looking for a way to fix all the cases in a unified way. Like automatically inserting CopyValueInstruction before a TranslatedLoad that would otherwise get its operand from a different AST element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently TranslatedCommaExpr, TranslatedPrefixCrementOperation in C++ when the result is not being loaded, TranslatedTransparentExpr, TranslatedAssignment, and TranslatedConditionDeclExpr have a getResult that is a child's getResult. TranslatedDestructorFieldDestruction has no getResult.

Of those, TranslatedTransparentExpr and TranslatedAssignment seem like the only ones that are likely to show up frequently as sources or sinks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. It sounds to me like adding CopyValueInstruction to all these instructions would be a good solution. But that's not for this PR.

Do you think this PR should be delayed until we can improve the instruction-to-expr mapping? I think this PR is just exposing a problem that was there already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdmarsh2's analysis is spot on. While we've always tried to ensure that there's at least one unique Instruction for each leaf Expr, we've been willing to let interior Expr nodes return the result of a child. We could go back and add CopyValue in all of those cases that aren't already in a void context, but one consequence of that would be that every use of parentheses in an expression would get another Instruction, which seems extravagant. I think that targeting only those expressions that are likely to be used as sources or sinks would make more sense. Either that, or only target those expressions that are not off to the side as Conversions. That may be the same set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's essentially the same set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged back from master, and this problem has gone away thanks to #1999.

jbj added 2 commits September 13, 2019 14:33
This string looked out of place compared to `ExplicitParameterNode`,
whose string is simply the name of the parameter and therefore
indistinguishable from an access to the parameter without looking at the
location also. This has not been a problem so far, and if we want to
distinguish more clearly between initial values and accesses at some
point, we should do it for `ExplicitParameterNode` and
`UninitializedNode` too.
The `toString` for IR data-flow nodes are now similar to AST data-flow
nodes. This should make it easier to use the IR as a drop-in replacement
in the future. There are still differences because the IR data flow
library takes conversions into account.

I did not attempt to align the new nodes we use for field flow. That can
come later, when we add field flow to IR data flow.
@jbj jbj force-pushed the ir-dataflow-toString branch from 1899e1f to 7cfbe88 Compare September 13, 2019 12:33
@jbj
Copy link
Contributor Author

jbj commented Sep 13, 2019

Rebased to fix semantic merge conflict with #1881.

@jbj
Copy link
Contributor Author

jbj commented Nov 7, 2019

(this PR is waiting for #1999 to get merged first)

@dbartol
Copy link

dbartol commented Nov 7, 2019

#1999 has been merged. I'll review this PR as soon as the conflicts are resolved.

Solved conflicts in `*.expected` by re-running the tests.
Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rdmarsh2 rdmarsh2 merged commit dbe885f into github:master Nov 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants