-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Don't interpret 'ReferenceToInstruction' as a load #7109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…h with the previous node (instead of its source). This gives a better overview of the path.
… interpret as a load. This makes use lose a bunch of flow, and we'll restore this flow in the next commit.
02a75a4
to
92af522
Compare
Hm. Looks like this is breaking all sorts of things in the internal repo :(. I'll convert this to a draft and investigate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed your code changes, though for the changes inside IR dataflow this hasn't been especially deep.
Results look good to me (diff of commits 2..4).
Have you checked performance?
// If there is more than one predecessor for this node | ||
// we specify the source location explicitly. | ||
n > 1 and | ||
exists(TaintedWithPath::PathNode pred | pred = getAPredecessor(node) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also check irTaint(source, pred, tag)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think it should.
@@ -376,13 +380,13 @@ module TaintedWithPath { | |||
*/ | |||
class TaintTrackingConfiguration extends TSingleton { | |||
/** Override this to specify which elements are sources in this configuration. */ | |||
predicate isSource(Expr source) { exists(getNodeForSource(source)) } | |||
predicate isSource(Element source) { exists(getNodeForSource(source)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (and perhaps a few related ones) looks like it will break existing user code. Is there a way we can keep existing code that overrides isSource(Expr source)
working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's also what I'm worried about. I haven't yet found a good hack around this that would allow us to keep the isSource(Expr)
working.
We introduced this TaintTrackingConfiguration
mechanism in #3189 to get path explanations to the old queries that still used the security.TaintTracking
library (which was changed to map to DefaultTaintTracking
). As @jbj
mentioned in here new queries really shouldn't be importing security.TaintTracking
. The QLDoc on the security.TaintTracking
file even says that users should prefer to use semmle.code.cpp.dataflow.TaintTracking
when designing new queries. And luckily, since actual DataFlow::Configurations
(which is the one you write if you're using the semmle.code.cpp.dataflow.TaintTracking
library) doesn't have this Expr
vs. Element
problem since that's working purely with DataFlow::Node
s instead.
With all that said, I'm still in the process a couple of small bugs. Once I've fixed those, I'll go back and think about a solution that has better backwards compatibility.
Thanks for the quick review! Performance looks fine. I've back-linked to the DCA run, so you should be able to find it now. |
0bf7fc5
to
c157080
Compare
c157080
to
44e32a6
Compare
…st, it gives us some new good flow (yay). Second, it causes some duplication of results that uses 'argv' as a taint source. The duplication isn't very bad, though. And since it is only for paths that start at 'argv', I think we can live with it for now.
44e32a6
to
21167f4
Compare
Completely getting rid of the path duplication turns out to be more tricky than I anticipated, so I've removed the additional code for this. This is just a UI problem for very specific cases:
So I don't think we should spend too much time on it. So now this PR just gets rid of an ugly hack left in there from #6825 and adds a bunch of new good flow. I'll take the PR out of draft once we have results from DCA. |
I've looked at the result changes. All the new results are good (🎉!) and the two lost results are due to two results that are reported twice on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the new results are good
Actually I think many of the cpp/unbounded-write
results are false positives - though I think due to weaknesses in the query logic rather than data flow (and its only a @precision medium
query).
The cpp/uncontrolled-arithmetic
changes look great, as do the test changes.
👍
Indeed. By "good" I mean that they're all results that we expect from dataflow improvements, but really they're FPs caused by query logic. Thanks for the clarification! |
Fixes point 2 in https://github.com/github/codeql-c-team/issues/732.
I recommend a commit-by-commit review. Here's an overview of what each commit does:
InlineExpectation
tests for path-explanations #7051 did all the necessary changes to that test so I didn't have to do in this PR, but apparently, I was wrong. This commit is necessary to show some of the bad effects of a later commit (f308be7).ReferenceToInstruction
in the list in the first place).argv
.annotate_path_to_sink
directory.