-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Wire up param/arg indirections in data flow #3123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test failures look similar but not identical to the undesired changes in #2704. I'll investigate, but I'd still like to hear comments on the overall approach of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to expose isParameterOf
with negative numbers while hiding ParameterIndirectionNode
. I expect that ParameterIndirectionNode
would be a fairly popular taint source, so this API would funnel users towards relying on the behavior of isParameterOf
.
cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
Outdated
Show resolved
Hide resolved
The new names are chosen to align with Java's `DataFlowUtil.qll`.
This change removes some duplicate results that will otherwise appear due to github#3123 and possibly github#2704.
This change removes some duplicate results that will otherwise appear due to github#3123 and possibly github#2704.
I've merged #3137 into this PR, hoping it'll fix the test failures. |
re: the operands being imprecise, that was an attempt to represent the uncertainty about what the callee would actually read. I don't think there's a problem with making them precise. |
The CPP-Differences job showed many new results on openjdk/jdk. Thanks to #3189 I've been able to see that they were FPs, and I was able to find the cause. I'm guessing #2704 is affected by the same problem. In code like
there is (and should be) no taint into |
See code comment. This fixes false positives on openjdk/jdk.
…-args Accepted test results that were in semantic merge conflict between these branches. The changed results are due to a bug that that's part of github/codeql-c-team#35.
I looked through the CPP-Differences. We lost 21 results, and they all look like false positives to me. They were due to field conflation. I've marked this PR as ready for review. It started out as a PR to add more flow, but in practice it now causes less taint. The extra flow we're adding was (mostly) already present as taint due to #2737. With the latest change to block |
I don't think we'll need it for #3118. EDIT: It does actually does provide the flow needed to capture field flow of the form: struct A {
int x, y;
};
void callSink(A* b) {
sink(b->x);
}
void foo() {
A a;
a.x = user_input();
callSink(&a);
} without me doing anything more than merging this PR into #3118. |
…-args Conflicts: cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/defaulttainttracking.cpp cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected cpp/ql/test/library-tests/dataflow/dataflow-tests/test_ir.expected
This fixes a cosmetic bug in `.../CWE-134/.../examples.c` in the internal repo.
Now that IR field flow has been merged, this PR makes a difference. There are three new results (plus two duplicates) in I've started https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1114/ to evaluate performance and correctness. A test is failing in the internal repo because it has some cosmetic differences in the path graph. I think we'll want to revert parts of #2737 at some point to avoid duplicate paths, but it seems to happen only rarely. |
I went through all the new CPP-Differences result and found that all were FPs caused by field conflation except two results for
I've pushed a fix. |
…-args Conflicts: cpp/ql/test/library-tests/dataflow/fields/ir-flow.expected
The CPP-Differences shows two FPs that seem to be caused by field conflation. I'll investigate. |
This should have been removed in 038bea2.
I'm quite confident that the two new FPs are caused by an existing field conflation that's just been amplified by this PR because there's more flow overall. I've opened #3475 with a test case that's independent of this PR. With that said, I think this PR is still blocked on fixing that field conflation because it introduces two FPs on git/git. |
…-args The conflicts came from how `this` is now a parameter but not a `Parameter` on `master`. Conflicts: cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/defaulttainttracking.cpp cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-ir-consistency.expected cpp/ql/test/library-tests/dataflow/fields/ir-flow.expected cpp/ql/test/library-tests/syntax-zoo/dataflow-ir-consistency.expected
// A read side effect is almost never exact since we don't know exactly how | ||
// much memory the callee will read. | ||
iTo.(ReadSideEffectInstruction).getSideEffectOperand().getAnyDef() = iFrom and | ||
not iFrom.isResultConflated() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this rule. It sends flow into a ReadSideEffectInstruction
, which actually means there is flow to the result value of that instruction. But it's an instruction without a result!
The rule is here because we need a node to be the ArgumentNode
in DataFlowPrivate.qll
, and I've chosen the node for the ReadSideEffectInstruction
. The alternatives I can see are:
- Omit this rule and use the most recent definition of the indirection (here named
iFrom
) as theArgumentNode
. Then a node could be the argument for multiple calls, leading to confusing path explanations. - Create a new synthetic node for this purpose.
- Change the data-flow library so that flow alternates between
Instruction
andOperand
nodes. The list of good reasons to do this is starting to get long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the effect on path explanations in argvLocal.expected
, where many new nodes with the awkward name BufferReadSideEffect
are now there.
|
…-args Fixed a semantic merge conflict by accepting test changes in `cpp/ql/test/library-tests/dataflow/fields/ir-path-flow.expected`.
Without this override, end users would see the string `BufferReadSideEffect` in path explanations.
I think this PR is ready to go.
I just added 8f702d4, which improves the |
// Check that the types match. Otherwise we can get flow from an object to | ||
// its fields, which leads to field conflation when there's flow from other | ||
// fields to the object elsewhere. | ||
init.getParameter().getType().getUnspecifiedType().(DerivedType).getBaseType() = | ||
iTo.getResultType().getUnspecifiedType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we still get flow from object to field when the type of the field is equal to the parameter type? I'm thinking about a situation with a binary tree like this:
struct Tree { Tree *left, *right; };
Tree source();
void sink(Tree*);
void read_left_subtree(Tree* tree) {
sink(tree->left);
}
...
Tree tree = source();
read_left_subtree(&tree);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add a test to find out. But even if the answer is that such conflation is possible, does that mean it's undesirable? If we want conflation between array indexes, don't we then also want conflation between entries in a tree or a linked list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It's probably reasonable to have object to field flow in such situations.
After a long day where I made probably every mistake that can be made on a pair of synced PRs, the internal PR is finally green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR is based on #3097, which should be merged first. Only b622d62 is unique to this PR (for now).
This PR is an attempt to do a principled version of #2737, conservative enough that it applies to data flow. I thought it would just be a matter of wiring up
ReadSideEffectInstruction
toInitializeParameterInstruction
, but it turns out that the involved operands are imprecise. I hope @dbartol or @rdmarsh2 have perspectives on whether they're supposed to be imprecise and whether the solution in this PR could be improved.